CONCURRENT  VALIDITY  OF  THE  SUTTER-EYBERG  STUDENT  BEHAVIOR 
INVENTORY  WITH  GRADE  SCHOOL  CHILDREN 


By 

ARISTA  RAYFIELD 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


1997 


ACKNOWLEDGEMENTS 
I  would  like  to  thank  my  chair.  Dr.  Sheila  Eyberg,  for  all  of  the  time  and  effort  she 
has  given  to  this  project  and  for  all  of  the  support  she  has  given  me.  I  would  also  like 
to  thank  my  other  committee  members,  Dr.  Suzanne  Johnson,  Dr.  James  Algina,  Dr. 
Stephen  Boggs,  and  Dr.  Gary  Geffken  for  their  time  and  suggestions  which  have  greatly 
improved  this  project.  In  addition,  I  would  like  to  thank  Dr.  Mel  Lucas  and  the  Alachua 
County  School  System  for  their  cooperation,  without  which  this  project  would  not  have 
been  possible.  I  would  also  like  to  take  this  opportunity  to  thank  my  colleagues  Dan 
Edwards  and  Tricia  Duming  and  research  assistants  Alfred  Amado,  Emily  LeGrand,  and 
Bethany  Lu  for  their  assistance  with  this  project.  Finally,  I  would  like  to  thank  my 
family  for  all  of  their  support  and  encouragement. 


ii 


TABLE  OF  CONTENTS 


ACKNOWLEDGEMENTS   ii 

LIST  OF  TABLES    vi 

LIST  OF  FIGURES    viii 

ABSTRACT    x 

INTRODUCTION  1 

Assessment  of  Conduct  Problem  Behaviors  4 

Rating  Scales   5 

Clinical  Significance  7 

Behavioral  Observations   11 

Behavioral  Observation  Systems  14 

Relationship  between  Behavioral  Observation  and 

Teacher  Ratings  16 

Sutter-Eyberg  Student  Behavior  Inventory    17 

Previous  Item  Analyses   18 

Previous  Scale  Analyses  20 

Relationship  between  the  Scales  of  the  SESBI  22 

Reliability   23 

Validity  24 

Factor  Analyses  28 

Gender,  Age,  and  Ethnicity  Effects  32 

Purposes  of  the  Present  Study   34 

METHOD  35 

Participants   35 

Measures   36 

Procedure   37 

Hypotheses   38 

Analyses    39 

iii 


RESULTS  41 

Item  Analyses  41 

Item  Analyses  of  the  SESBI   41 

Item  Analyes  of  the  SESBI-R   41 

Internal  Consistency   42 

SESBI  Intensity  Scale  42 

SESBI  Problem  Scale   42 

SESBI-R  Intensity  Scale  43 

SESBI-R  Problem  Scale  44 

Test  Retest  Reliability   45 

SESBI   45 

SESBI-R  45 

REdSOCS   46 

Kappa   46 

Demographic  Effects   47 

Correlations  between  Behavioral  Observations  and  SESBI  Scores   50 

Correlations  between  Behavioral  Observations 
and  the  SESBI-R  Scores  50 

Child  Behavior  Checklist- 
Teacher  Rating  Form  51 

Externalizing  Score   51 

Relationship  with  the  SESBI  59 

Relationship  with  the  SESBI-R   59 

Effects  of  Demographic  Variables  on  the  SESBI 
Intensity  Scale  60 

Effects  of  Demographic  Variables  on  the  SESBI 
Problem  Scale  63 

Effects  of  Demographic  Variables  on  the  SESBI-R 
Intensity  Scale  63 

Effects  of  Demographic  Variables  on  the  SESBI-R 
Problem  Scale  66 

Cutt  Off  Scores  [  [  , 70 

SESBI   .  .  70 

SESBI-R   70 

CBCL-TRF  71 

Factor  Analysis  of  the  SESBI-R   71 

DISCUSSION   76 

APPENDIX  88 

REFERENCES  91 

iv 


BIOGRAPHICAL  SKETCH 


LIST  OF  TABLES 

Table  page 

1.  Cronbach's  alphas  for  the  Intensity  and 

Problem  Scales  of  the  SESBI  43 

2.  Cronbach's  alphas  for  the  Intensity  and 

Problem  Scales  of  the  SESBI-R   44 

3.  Means  and  Standard  Errors  of  Offtask  by 

Type  of  Classroom  and  Race  47 

4.  Analysis  of  Variance  for  the  Offtask  Category  48 

5.  Analysis  of  Variance  for  the  Inappropriate  Category   48 

6.  Means  and  Standard  Errors  of  Inappropriate  Category  by 

Type  of  Classroom  and  Race  49 

7.  Analysis  of  Variance  for  the  Noncomply  Category   49 

8.  Means  and  Standard  Errors  for  Noncomply  by  Grade   50 

9.  Correlations  between  Behavioral  Observations 

and  the  SESBI  51 

10.  Correlations  between  Behavioral  Observations 

and  the  SESBI-R   52 

1 1 .  Analysis  of  Variance  for  the  Externalizing  Raw  Score 

of  the  CBCL-TRF    53 

12.  Means  and  Standard  Errors  of  the  CBCL-TRF  Externalizing  Raw  Score 

by  Grade,  Race,  and  Type  of  Classroom   57 

13.  Analysis  of  Variance  for  the  Internalizing  Raw  Score 

of  the  CBCL-TRF    58 


vi 


14.  Means  and  Standard  Errors  of  the  CBCL-TRF  Internalizing  Raw  Score 

by  Grade,  Race,  and  Type  of  Classroom   58 

15.  Relationship  between  the  SESBI  and  SESBI-R  Scale 

Scores  and  the  CBCL-TRF    60 

16.  Analysis  of  Variance  Model  with  the  SESBI  Intensity  Scale    61 

17.  Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI  Problem  Scale  65 

18.  Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI-R  Intensity  Scale  68 

19.  Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI-R  Problem  Scale  70 

20.  Factor  Loadings  on  the  Intensity  Scale 

of  the  SESBI-R   73 

21.  Means  and  Standard  Deviations  for  Factors  1  and  2  of  the  SESBI-R  ....  71 


vii 


LIST  OF  FIGURES 


Figurg  page 

1 .  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (Kindergarten)  54 

2.  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (First  Grade)   54 

3.  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (Second  Grade)  55 

4.  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (Third  Grade)  55 

5.  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (Fourth  Grade)   56 

6.  Type  of  Clasroom  x  Child  Race  x  Grade  Interaction  with 

the  CBCL-TRF  Externalizing  Score  (Fifth  Grade)   56 

7.  Type  of  Classroom  Main  Effect  for  the 

SESBI  Intensity  Scale   62 

8.  Type  of  Classroom  x  Child  Race  x  Child  Gender  x 

Teacher  Race  Interaction  with  the  SESBI  Intensity 

Scale  (Regular  classrooms)   62 

9.  Type  of  Classroom  Main  Effect  for  the 

SESBI  Problem  Scale   64 

10.  Type  of  Classroom  x  Child  Race  x  Child  Gender  x 

Teacher  Race  Interaction  with  the  SESBI  Problem 

Scale  (Regular  classrooms)   64 

11.  Type  of  Classroom  Main  Effect  for  the 

SESBI-R  Intensity  Scale  67 


viii 


12.  Type  of  Classroom  x  Child  Race  x  Child  Gender  x 

Teacher  Race  Interaction  with  the  SESBI-R  Intensity 

Scale  (Regular  classrooms)   67 

13.  Type  of  Classroom  x  Child  Gender  x  Teacher  Race 

Interaction  with  the  SESBI-R  Problem  Scale  72 


ix 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

CONCURRENT  VALIDITY  OF  THE  SUTTER-EYBERG  STUDENT 
BEHAVIOR  INVENTORY  WITH  GRADE  SCHOOL  CHILDREN 

By 

Arista  Rayfield 
May,  1997 

Chairperson:  Sheila  M.  Eyberg 

Major  Department:  Clinical  and  Health  Psychology 

This  study  demonstrated  the  concurrent  and  discriminant  validity  of  the  Sutter- 

Eyberg  Student  Behavior  Inventory  (SESBI)  and  the  Sutter-Eyberg  Student  Behavior 

Inventory-Revised  (SESBI-R)  with  grade  school  children.  Fifty-two  teachers 

completed  the  SESBI  and  the  Child  Behavior  Checklist-Teacher  Rating  Form  (CBCL- 

TRF)  on  415  children  in  kindergarten  through  5th  grade  in  the  Alachua  County  Public 

School  system.  One  hundred  one  of  the  children  rated  were  in  classes  for  children 

with  exceptional  student  education  needs  and  the  remainder  were  in  regular  classes. 

The  sample  was  evenly  divided  between  African  Americans  and  Caucasians  and 

between  males  and  females.  The  SESBI  demonstrated  high  coefficient  alpha  within 

demographic  subgroups  and  satisfactory  test-retest  reliability.  The  SESBI  scores  were 

found  to  correlate  significantly  with  observational  measures  of  inappropriate  behavior 

and  off-task  behavior  in  regular  classrooms.  Higher  SESBI  scores  were  related  to 

X 


noncompliance  to  teacher  commands  in  ESE  classrooms.  As  hypothesized,  SESBI 
scale  scores  were  found  to  correlate  significantly  higher  with  the  externalizing  scale  of 
the  CBCL-TRF  than  with  the  internalizing  scale  of  the  CBCL-TRF.    Children  in  ESE 
classes  obtained  significantly  higher  scores  on  the  SESBI  scales.  SESBI  scores  were 
found  to  be  related  to  child  race,  teacher  race,  and  gender.  Teachers  rated  boys  of  the 
teacher's  same  race  as  having  fewer  behavior  problems  than  boys  of  the  opposite  race. 
Girls  were  rated  equally  by  teachers  of  both  races.  Factor  analyses  supported  a  two 
factor  solution  for  the  SESBI-R. 


xi 


INTRODUCTION 
Estimates  of  prevalence  rates  of  conduct  problem  behavior  vary  widely. 
According  to  the  American  Psychiatric  Association's  Diagnostic  and  Statistical  Manual 
of  Mental  Disorders  (1994)  fourth  edition,  the  prevalence  of  Attention  Deficit 
Hyperactivity  Disorder  ranges  from  3-5%,  Conduct  Disorder  ranges  from  4-13%,  and 
Oppositional  Defiant  Disorder  ranges  from  2-16%.  ADHD  is  characterized  by  "a 
persistent  pattern  of  inattention  and/or  hyperactivity-impulsivity  that  is  more  frequent 
and  severe  than  is  typically  observed  in  individuals  at  a  comparable  level  of 
development" (American  Psychiatric  Association,  1994,  p.  78).  There  are  3  subtypes 
of  ADHD:  predominantly  inattentive  type,  predominately  hyperactive-impulsive  type, 
and  combined  type.  Diagnosis  of  any  one  of  these  subtypes  of  ADHD  entails  the 
presence  of  the  behavioral  disturbance  for  at  least  six  months.    For  the  primarily 
inattentive  subtype  at  least  6  of  the  following  symptoms  must  be  present:  often  fails 
to  give  close  attention  to  details,  often  has  difficulty  sustaining  attention,  often  does 
not  seem  to  listen,  has  difficulty  following  instructions,  has  difficulty  organizing,  is 
often  easily  distracted,  and  is  often  forgetful.  For  the  primarily  hyperactive-impulsive 
type  at  least  six  of  the  following  symptoms  must  be  present:  often  fidgets,  difficulty 
remaining  seated,  excessive  running  or  climbing,  difficulty  playing  quietly,  talks 
excessively,  often  "on  the  go,"  blurts  out  answers  to  questions,  difficulty  awaiting 


1 


2 

turn,  and  often  interrupts.  For  the  combined  type  the  child  must  meet  the  criteria  for 
both  the  inattentive  type  and  the  hyperactive-impulsive  type. 

Oppositional  Defiant  Disorder  refers  to  a  "recurrent  pattern  of  negativistic, 
defiant,  disobedient,  and  hostile  behavior"  (American  Psychiatric  Association,  1994, 
p.  91).  This  pattern  of  behavior  must  have  been  present  for  six  months,  and  at  least 
four  of  the  following  eight  behaviors  must  have  been  present  and  occurred  frequently: 
loses  temper;  argues  with  adults;  actively  defies  or  refuses  adult  requests  or  rules; 
deliberately  annoys  other  people;  blames  others  for  his  or  her  own  mistakes;  is  touchy 
or  easily  annoyed  by  others;  angry  and  resentful;  spiteful  or  vindictive. 

Finally,  Conduct  Disorder  refers  to  "a  repetitive  and  persistent  pattern  of  behavior 
in  which  the  basic  rights  of  others  and  major  age-appropriate  societal  norms  or  rules 
are  violated"  (American  Psychiatric  Association,  1994,  p.  85).  There  are  two 
subtypes  of  Conduct  Disorder,  Childhood-onset  and  Adolescent-onset.  Children 
diagnosed  with  conduct  disorder  before  age  10  are  considered  childhood  onset  type 
and  are  more  likely  to  have  persistent  Conduct  Disorder  and  to  develop  adult 
Antisocial  Personality  Disorder.  The  pattern  of  behavior  must  have  been  present  for 
at  least  six  months,  and  at  least  three  of  the  following  fifteen  behaviors  must  have 
been  present:  bullies  threatens  or  intimidates  others;  often  stays  out  late  at  night 
without  parental  permission;  has  stolen  without  confrontation  of  victim;  run  away 
from  home  overnight  (at  least  twice  and  once  without  returning);  lies;  sets  fires;  often 
truant  from  school;  has  broken  into  someone  else's  house,  building,  or  car; 
deliberately  destroys  others'  property;  physically  cruel  to  animals;  has  forced  another 


3 

person  into  sexual  activity;  has  used  a  weapon  in  more  than  one  fight;  initiates 
physical  fights;  has  stolen  with  confrontation  of  victim;  physically  cruel  to  people. 

Disruptive  behaviors  are  the  most  frequently  occurring  problems  for  children 
(Wells  &  Forehand,  1985;  Quay,  1986)   and  account  for  most  of  the  mental  health 
clinic  referrals  received  for  children  (Kazdin,  Siegel,  &  Bass,  1990;  Schuhmann, 
Durning,  Eyberg,  &  Boggs,  in  press).  There  are  several  diagnoses  that  occur 
comorbidly  with  the  conduct  problem  behaviors  including  learning  disabilities, 
depression,  dysthymia,  and  anxiety  disorders  (Platzman  et  al.,  1992;  APA.  1994).  If 
left  untreated,  many  children  diagnosed  with  disruptive  behavior  disorders  have  an 
increased  risk  of  engaging  in  delinquent  or  criminal  behavior  (Loeber,  1982;  Robins, 
1970).  Loeber  (1982)  concluded  that  children  whose  behavior  problems  had  an  early 
onset  and  occurred  at  a  high  frequency  in  multiple  settings  had  the  greatest  risk  for 
continued  anti-social  behavior.  The  DSM-IV  (APA,  1994)  further  indicates  that  the 
severity  of  a  child's  disruptive  behavior  is  not  only  related  to  the  number  of  behavior 
problems  but  also  the  number  of  situations  in  which  they  occur.  Robins  (1966)  found 
that  40  percent  of  conduct  disordered  children  were  diagnosed  with  antisocial  or 
sociopathic  disturbances  as  adults,  while  only  10  percent  of  their  non-conduct 
disordered  peers  received  such  a  diagnosis.  Not  only  are  children  with  disruptive 
behavior  disorders  at  higher  risk  for  continued  antisocial  behavior,  but  also  they  tend 
to  be  at  higher  risk  for  other  psychiatric  disorders,  poor  occupational  adjustment, 
marital  distress,  and  poorer  physical  health  (McMahon  &  Wells,  1989). 


4 

Assessment  of  Conduct  Problem  Behavior 
It  has  become  apparent  that  for  proper  assessment  of  children  with  disruptive 
behavior  disorders,  multiple  methods  such  as  interview,  behavioral  observations,  and 
behavior  rating  scales  must  be  used  as  well  as  multiple  informants  including  the 
child's  parents  and  teachers  (Wells  &  Forehand,  1985;  Achenbach,  McConaughy,  & 
Howell,  1987).  Because  the  diagnosis  of  ADHD  now  requires  that  the  behavior  be 
problematic  in  more  than  one  setting,  assessment  of  school  behavior  has  become 
imperative.  Multiple  measures  ensure  capturing  cross-situational  as  well  as  situation- 
specific  behaviors  (Platzman  et  al.,  1992).  The  modest  correlations  found  between 
informants  (Achenbach,  McConaughy,  &  Howell,  1987)  suggest  that  children  behave 
differently  in  different  situations.  In  addition,  the  same  behaviors  may  or  may  not  be 
viewed  as  problematic  by  different  people,  depending  on  their  tolerance  level  or  their 
personal  normative  expectations  of  children. 

Teachers  are  one  important  source  of  information,  because  they  spend  a 
considerable  amount  of  time  with  the  child  but  are  less  personally  involved  than 
parents  and  usually  have  a  good  sense  of  age-appropriate  behavior  (Funderburk  & 
Eyberg,  1989).  In  addition,  teacher  ratings  have  been  found  to  be  more  reliable  than 
parent  ratings  in  identifying  ADHD  children  (Loeber,  Green,  &  Lahey,  1990). 
Therefore  it  is  important  to  develop  well-validated  measures  for  teachers. 

Broad-based  assessment  can  be  very  costly.  Many  researchers  suggest  that  cost- 
effective  behavior  ratings  scales  be  used  for  the  initial  assessment  and  the  more 


5 

expensive  methods,  such  as  school  observation  coding  systems,  be  used  only  if  further 
assessment  is  needed  (McMahon  &  Wells,  1989;  Schumann  et  al.,  in  press). 

Rating  Scales 

There  are  several  standards  that  rating  scales  must  meet  in  order  to  be 
psychometrically  sound  and  clinically  useful  (Barkley,  1990).  Items  on  the  scale 
should  be  worded  clearly  so  that  the  respondent  knows  the  behavior  that  is  being 
rated.  The  more  specific  and  clear  the  wording,  the  greater  the  reliability  of  the 
instrument  will  be. 

A  rating  scale  should  contain  a  sufficient  number  of  items  so  as  to  adequately 
sample  the  behavioral  construct  or  constructs  of  interest  (Barkley,  1990).  The  rating 
scale,  however,  should  not  be  so  time-consuming  that  it  discourages  respondents  from 
completing  it. 

The  response  format  should  have  a  sufficient  range  (Barkley,  1990).  Scales 
with  only  "yes-no"  answer  formats  can  rarely  make  the  fine  discriminations  in  the 
frequency  and  severity  of  behaviors  required  to  discriminate  between  clinical  and 
normal  samples.  Also,  the  descriptors  given  to  the  anchor  points  of  the  item  ratings, 
such  as  "never"  or  "always,"  should  be  as  clear  as  possible,  because  the  reliability  of 
the  scale  will  decrease  as  the  descriptors  become  more  subjective  (Barkley,  1990). 

Anastasi  (1988)  defines  validity  as  "what  the  test  measures  and  how  well  it  does 
so"  (p.  139).  Barkley  (1990)  suggests  that  a  rating  scale  should  demonstrate  face 
validity;  the  item  content  should  appear  to  reflect  the  construct  that  the  instrument 
purports  to  measure.  Face  validity  is  desirable  for  public  relations  and  rapport 


6 

(Anastasi,  1988).  If  a  rating  scale  appears  to  be  irrelevant  or  silly,  cooperation  by  the 
rater  may  be  poor  (Anastasi,  1988).  In  addition,  a  rating  scale  should  have  concurrent 
validity;  it  should  be  shown  to  correlate  with  other  measures  of  the  same  construct 
including  measures  using  different  methods,  such  as  behavioral  observation.  Another 
type  of  validity  required  for  rating  scales  is  discriminant  validity,  or  the  ability  to 
discriminate  accurately  between  clinical  and  nonclinical  samples.  To  have  true 
discriminant  validity,  Campbell  and  Fiske  (1959)  suggested  that  rating  scales  should 
converge  with  other  measures  assessing  the  same  trait  but  should  not  converge  with 
measures  that  assess  a  different  trait.  Perhaps  the  most  important  type  of  validity  is 
predictive  validity,  which  means  that  the  rating  scale  would  correlate  with  behavior 
some  time  later  in  development  (Barkley,  1990). 

The  reliability  of  a  rating  scale  over  time  and  between  raters  must  also  be 
satisfactory.  Reliability  is  a  prerequisite  for  validity.  Finally  Barkley  (1990)  suggests 
that  it  would  be  useful  if  a  scale  could  demonstrate  "prescriptive"  utility.  This  refers 
to  the  ability  of  the  scale  to  determine  which  types  of  treatment  might  be  most 
successful  with  which  types  of  patients. 

In  the  absence  of  additional  interpretative  data,  raw  scores  from  rating  scales  are 
meaningless  (Anastasi,  1988).  For  scores  to  be  meaningful,  one  needs  a  clearly 
defined  and  uniform  frame  of  reference.  In  psychological  testing,  the  most  common 
frame  of  reference  used  is  normative  data.  Norms  are  empirically  established  by 
determining  how  persons  in  a  representative  sample  score  on  the  rating  scale 
(Anastasi,  1988).  Few  tests  are  standardized  as  well  as  is  popularly  assumed 


7 

(Anastasi,  1988).  Variables,  such  as  SES,  that  limit  the  generalizability  of  the  data 
need  to  be  made  explicit  (Kazdin,  1977)  and  be  taken  into  account  by  the  test  user 
(Anastasi,  1988). 

Rating  scale  measures  offer  unique  advantages  in  the  multi-modal  assessment 
of  behavior.  They  are  particularly  useful  for  covering  a  broad  range  of  behaviors, 
including  low  frequency  behaviors.  In  addition,  rating  scales  require  relatively  little 
time  to  administer  and  score,  they  are  easily  quantifiable,  they  lend  themselves  to 
normative  data  and  finally,  they  incorporate  the  perceptions  of  significant  individuals 
in  the  child's  life  (McMahon  &  Forehand,  1988).  Behavior  rating  scales  are 
considered  to  be  excellent  measures  of  parent  and  teacher  perceptions  of  the  child  and 
are  extensively  used  as  treatment  outcome  and  social  validation  measures  in  treatment 
studies  (McMahon  &  Wells,  1989). 

Clinical  Significance 

In  treatment  outcome  studies,  a  test  of  statistical  significance  of  differences 
between  the  means  of  rating  scale  scores  at  pre-  and  posttreatment  is  often  used  to 
determine  if  there  has  been  a  change  in  the  behavior  of  the  child.  The  above  method 
is  problematic  for  at  least  two  reasons:  First,  the  statistical  tests  of  group  differences 
provide  no  information  on  the  variability  of  response  to  the  treatment  that  has  been 
administered,  yet  this  information  is  important  to  clinicians  (Jacobson  &  Truax,  1991; 
Jacobson,  FoUette,  &  Revenstorf,  1984).  Second,  a  treatment  may  produce 
statistically  significant  treatment  effects  when  only  half  of  the  subjects  improved.  In 
fact,  statistical  tests  treat  intra-subject  and  inter-subject  variability  as  error,  not  as 


8 

information  (Hugdahl  &  Ost,  1981).  Statistical  tests  tell  us  that  the  effect  we  see  is 
not  simply  due  to  chance  but  Ccuinot  tell  us  the  importance  or  meaningfulness  of  the 
change.  For  example,  a  weight  loss  of  2  pounds  per  person  in  a  weight  loss  treatment 
and  no  change  of  weight  in  a  control  group  would  yield  highly  statistically  significant 
results  yet  not  be  a  clinically  significant  change  (Jacobson,  Follette,  &  Revenstorf, 
1984,  1986). 

There  have  been  several  suggested  ways  to  define  clinical  significance 
including  a  large  proportion  of  the  clients  improving,  a  change  which  is  large  in 
magnitude,  an  improvement  in  the  client's  everyday  functioning,  a  change  which  is 
recognizable  to  peers  and  significant  others,  an  elimination  of  the  presenting  problem, 
and  the  attainment  of  a  level  of  functioning  which  is  no  longer  distinguishable  from 
peers  (Jacobson,  Follette,  &  Revenstorf,  1984).  Although  there  is  no  agreed  upon 
definition,  the  one  idea  the  suggestions  have  in  common  is  that  change  should  be 
practically  important  and  not  based  on  a  statistical  test  alone. 

Jacobson  and  colleagues  (1984)  suggested  that  psychotherapy  research  needs 
agreed-upon  conventions  for  determining  what  constitutes  improvement  in  individuals 
and  a  general  consensus  as  to  what  is  meant  by  clinical  significance.  They  suggest  the 
definition  should  be  applicable  to  a  wide  variety  of  problems  and  be  both 
psychometrically  and  clinically  sound.  One  way  to  conceptualize  clinical  significance 
is  returning  to  normal  functioning  (Kazdin,  1977;  Jacobson,  Follette,  &  Revenstorf, 
1986).  Jacobson  and  colleagues  (1986)  discussed  three  alternative  ways  to  define 
clinically  significant  change  following  therapy: 


9 

I.  The  level  of  functioning  should  fall  outside  the  range  of  the  dysfunctional 
population  (i.e.  2  standard  deviations  below  the  dysfunctional  mean). 

II.  The  level  of  functioning  should  fall  within  the  range  of  the  functional 
population  (i.e.  within  2  standard  deviations  of  the  functional  mean). 

III.  The  level  of  functioning  should  place  the  client  closer  to  the  mean  of  the 
functional  population  than  the  mean  of  the  dysfunctional  population  (pp.  339- 
342). 

Jacobson  and  colleagues  (1986)  note  that  the  first  method  does  not  take  into 
account  normal  functioning  and  is  therefore  the  least  desirable  method.  The  decision 
would  depend  on  the  distribution  of  the  specific  population  studied.  If  the 
distributions  between  the  functional  and  dysfunctional  populations  do  not  overlap,  then 
the  second  method  would  be  the  most  desirable  method. 

Finally  if  the  functional  and  the  dysfunctional  populations  do  overlap,  the  third 
method  is  the  preferred  method.  In  addition,  Jacobson  and  colleagues  (1986)  suggest 
that  this  is  also  the  least  arbitrary  of  the  methods.  This  method  of  measuring  change 
gives  an  objective  and  unbiased  system  which  does  not  depend  on  the  specific  disorder 
so  it  will  have  broad  applicability  (Jacobson,  Follette,  &  Revenstorf,  1986). 

Several  criticisms  of  the  methods  of  defining  clinical  significance  described 
above  have  arisen.  First,  all  three  assume  that  there  are  distinct  distributions  for  the 
dysfunctional  and  functional  populations.  An  alternative  hypothesis  is  that  there  is  one 
underlying  distribution  and  the  dysfunctional  individuals  are  at  the  tails  of  the 
distribution  (Wampold  &  Jensen,  1986;  Hollon  &  Flick,  1988).  A  second  criticism 


10 

levied  is  that  there  could  potentially  be  a  different  cut  off  score  for  each  study 
depending  on  the  severity  of  the  dysfunctional  population  (Wampold  &  Jensen,  1986; 
Hollon  &  Flick,  1988).  There  has  also  been  question  about  how  the  functional 
population  should  be  defined.  Jacobson  and  colleagues  (1984)  originally  suggested 
that  the  functional  population  should  contain  no  dysfunctional  members.  Hollon  and 
Flick  (1988),  on  the  other  hand,  suggested  that  the  population  contain  individuals  in 
all  categories  but  in  proportion  to  their  actual  representation  in  the  population  at  large. 
Finally,  it  is  not  known  how  robust  these  methods  would  be  to  non-normal 
distributions. 

Jacobson  and  colleagues  (1986)  responded  to  the  criticisms  of  Wampold  and 
Jensen  (1986)  by  saying  that  regardless  of  the  nature  of  the  underlying  distribution, 
there  are  two  distinct  groups  of  interest,  those  seeking  or  needing  help  for  a  particular 
disorder  and  those  who  are  not.  These  two  groups  are  distinguishable  on  a  variety  of 
dimensions,  and  as  long  as  two  distinguishable  groups  exist,  it  is  possible  to  identify  a 
point  at  which  an  individual  is  equally  likely  to  be  a  member  of  either  group.  Having 
two  distinct  groups  is  a  necessary  and  sufficient  condition  for  establishing  cutoff 
points.  People  who  seek  treatment  are  a  self-selected  group  who  probably  form  one 
end  of  some  underlying  distribution.    When  scores  are  plotted  and  compared  to 
normal  individuals,  however,  the  resulting  histograms  are  often  bimodal.  Jacobson 
and  colleagues  (1986)  concluded  that  the  two-distribution  model  fits  the  population  for 
which  treatment  was  designed  far  better  than  the  one  distribution  model.  Jacobsen 
(1986)  also  noted  that  cutoff  scores  should  be  calculated  using  carefully  collected 


11 

samples  of  functional  and  dysfunctional  populations,  so  that  they  would  be 
generalizable  to  various  samples.  Jacobson  and  Truax  (1991)  discussed  the  issue  of 
selecting  the  two  samples.  They  noted  that  there  would  naturally  be  a  wide  range  of 
scores  in  the  functional  population  and  that  high  scores  should  not  necessarily  be 
excluded,  but  that  individuals  in  the  functional  group  should  not  be  seeking  treatment. 
In  conclusion,  it  seems  that  the  method  for  obtaining  the  cut  off  score  would  in  part 
help  determine  the  type  of  population  obtained.  If  using  the  third  method  that  utilizes 
a  functional  and  dysfunctional  mean,  it  seems  that  one  would  obtain  a  sample  of 
individuals  not  currently  seeking  treatment  for  the  functional  group.  If,  however,  one 
planned  to  use  the  second  method  for  calculating  a  cut  off  score,  a  sample  with 
comparable  proportions  of  individuals  seeking  therapy  to  that  in  the  population  would 
be  more  desirable. 

Behavioral  Observation 

Behavioral  observation  is  an  important  component  of  multi-method  assessment. 
Observational  measures  have  been  used  to  assess  the  environmental  determinants  of 
behavior  and  to  assess  treatment  effects  (Sattler,  1986).  In  addition,  observational 
methods  help  refine  other  measurement  techniques,  such  as  rating  scales,  by  validating 
the  constructs  the  scales  attempt  to  measure  (Platzman  et  al.,  1992). 

There  are  several  ways  to  record  behavioral  observations.  One  method  is 
narrative  recording  which  may  be  either  anecdotal  or  running  recording  (Sattler, 
1986).  This  type  of  observation  requires  no  particular  codes  or  time  frame  in  which 
to  code.  In  anecdotal,  only  behaviors  that  are  thought  to  be  important  are  recorded. 


12 

In  running  recording,  all  behaviors  that  occur  are  recorded  at  the  time  they  occur. 
The  narrative  method  gives  a  great  deal  of  information,  but  is  not  as  easily 
quantifiable  as  other  methods,  and  is  therefore  not  as  useful  for  research.  In 
addition,  this  method  is  difficult  to  validate,  and  does  not  fully  describe  some  types  of 
behaviors  that  may  be  critical.  The  findings  have  limited  generalizability,  and  the 
findings  are  only  as  good  as  the  observer. 

A  second  type  of  observation  method  is  interval  recording  in  which  certain 
behaviors  are  recorded  during  specified  time  intervals  (Sattler,  1986).  There  are 
several  different  types  of  interval  recording.  In  partial-interval  recording,  a  behavior 
is  scored  only  once  during  a  time  frame  no  matter  how  long  it  lasts  or  the  number  of 
times  it  occurs.  In  whole-interval  time  sampling  the  behavior  is  scored  only  if  it  lasts 
through  the  entire  time  interval.  There  are  several  advantages  to  interval  recording 
compared  to  the  narrative  recording.  Interval  recording  is  easier  to  quantify,  uses 
time  efficiently,  permits  the  recording  of  almost  any  behavior,  is  useful  for  checking 
inter-observer  reliability.  In  addition  it  helps  to  ensure  that  the  predefined  behaviors 
are  observed  under  the  same  conditions  each  time.  The  disadvantage  to  this  type  of 
recording  is  that  it  provide  a  somewhat  artificial  view  of  the  behavior  because  the  time 
interval  and  not  the  behavior  dictate  the  recording  framework.  Further,  it  may  allow 
certain  important  behaviors  related  to  the  problem  to  be  overlooked  such  as  the 
antecedent  or  consequence  of  the  behavior.  Interval  recording  may  not  reveal  the 
actual  frequency  of  the  behavior.  For  example,  a  behavior  may  occur  5  times  in  the 
15  second  interval  and  be  scored  only  once.  Frequency  information  may  be  very 


13 

important  for  research  purposes  and  interval  recording  may  overestimate  low  rate 
behaviors  and  underestimate  high  rate  behaviors. 

A  third  type  of  recording  behavioral  observations  is  event  recording  in  which  each 
instance  of  a  specific  behavior  or  event  is  recorded  as  it  occurs  during  the 
observational  period  (Sattler,  1986).  As  with  interval  recording,  behavior  is  sampled. 
The  unit  of  measurement,  however,  is  the  behavior  and  not  the  time  interval.  Event 
recording  is  also  easily  quantified,  permits  the  recording  of  most  behaviors,  and  is 
useful  for  calculating  inter-observer  reliability.  It  is  less  appropriate  for  behaviors  that 
occur  very  rapidly  or  are  long  in  duration. 

Finally,  rating  recording  is  conducted  by  rating  behaviors  on  a  checklist  after 
an  observational  period  (Sattler,  1986).  Recording  rating  has  much  greater  rater 
subjectivity  and  is  thus  less  reliable.  It  is  often  less  costly  than  other  methods  but  is 
not  as  easily  quantified  and  is  thus  less  suitable  for  research  purposes. 

In  behavioral  observation  it  is  important  for  research  purposes  that  the  behavior 
being  observed  is  defined  in  objective,  clear,  and  complete  terms  (Sattler,  1986).  The 
definition  should  help  the  observer  be  able  to  recognize  when  the  behavior  occurs  and 
to  distinguish  it  from  other  similar  behaviors. 

Inter-observer  agreement  is  an  important  requisite  for  effective  direct  observation. 
Inter-observer  agreement  is  found  to  be  high  in  many  studies,  yet  when  observers  do 
not  know  that  their  data  will  be  used  for  reliability  checks  the  reliability  drops 
(Weinrott  &  Jones,  1984).  This  finding  suggests  that  reported  reliability  rates  in 


14 

many  studies  may  be  inflated  and  emphasizes  the  importance  of  reliability  checks  on 
as  many  observations  as  possible. 

Classroom  observations  are  particularly  powerful  measures  of  child  behavior. 
Because  of  time  and  expense  however,  these  are  rarely  conducted.  In  their  review  of 
direct  observation  with  children  with  ADHD,  Platzman  and  colleagues  (1992)  found 
that  classroom  observations  were  more  likely  to  yield  significant  effects  than 
laboratory  observation.  Therefore,  classroom  observations  are  particularly  good  way 
be  to  validate  other  measures,  such  as  rating  scales. 

Behavioral  Observation  Systems 
In  their  review  of  studies  using  direct  observation  for  behavioral  problems 
Platzman  and  colleagues  (1992)  found  several  categories  that  appeared  to  be  most 
predictive  of  behavior  problems  such  as  off-task  behavior,  motor  movement,  and 
negative  vocalizations.  They  also  concluded  that  there  were  very  few  standardized 
direct  observations  systems  and  suggested  several  directions  for  future  research  with 
direct  observation  including  limiting  age  spans  or  using  age  as  an  independent  variable 
and  using  structured  classroom  settings  more  frequently  as  they  are  likely  to  maximize 
symptom  display  by  children.  They  also  noted  that  there  was  very  little  research  on 
girls  in  the  direct  observation  literature.  Platzman  and  colleagues  indicated  the 
importance  of  operationalizing  all  behaviors  measured  to  assist  in  future  replication. 
Finally  it  was  suggested  that  future  research  collect  normative  data  so  that  results  can 
be  compared  and  be  valid  across  samples  and  situations. 


15 

One  classroom  observational  system  that  has  been  widely  used  is  the  Classroom 
Observation  Code  (Abikoff,  Gittelman-Klein,  &  Klein,  1977).  The  system  has  11 
categories  and  is  an  interval  observation  system  (Abikoff  &  Gittelman,  1985). 
Observations  always  occur  in  structured  didactic  situations  or  periods  of  independent 
academic  work  with  the  supervision  of  the  teacher.  This  system  has  been  shown  to 
discriminate  between  children  with  and  without  the  diagnosis  of  ADHD  and  to  be 
sensitive  to  treatment  effects  (Abikoff,  Gittelman,  &  Klein,  1980).  One  limitation  to 
this  system  is  the  amount  of  training  time  necessary  to  obtain  acceptable  reliability. 
For  example,  in  one  study  the  authors  reported  that  for  observers  to  achieve  70% 
agreement  it  took  approximately  50  hours  of  training  and  only  5  out  of  8  graduate  and 
senior  psychology  students  were  able  to  obtain  acceptable  reliability  (Abikoff, 
Gittelman-Klein,  &  Klein,  1977).  It  is  known  that  reliability  varies  inversely  with 
complexity  of  the  observation  system  (Reid,  Patterson,  Baldwin,  &  Dishion,  1988). 
Therefore  a  simpler  system  that  is  also  valid  would  be  desirable. 

At  the  University  of  Florida,  a  school  observation  system  has  been  recently 
developed  called  the  Revised  Edition  of  the  School  Observation  Coding  System 
(REdSOCS).  The  system  was  originally  based  on  school  observation  systems  used  by 
Forehand  and  colleagues  (Breiner  &  Forehand,  1981)  and  Walker  and  colleagues 
(Walker,  Shinn,  O'Neill,  &  Ramsey,  1986).  Categories  from  the  two  systems  were 
selected  by  McNeil  et  al.  (1991).  Modifications  were  made  to  the  system  by 
Rigelhaupt,  Boggs,  Eyberg,  and  Edwards  (1994)  for  purposes  of  definition 
clarification. 


16 

REdSOCS  is  an  interval  coding  system  designed  to  measure  behavior  of  children 
in  preschool,  kindergarten,  and  elementary  school.  There  are  three  general  categories 
of  behavior  that  are  coded  in  this  system  1)  appropriate  versus  inappropriate  behavior, 
2)  comply  versus  noncomply,  and  3)  on-task  versus  off-task.  The  percent  of 
inappropriate  behavior,  percent  noncompliance,  and  percent  time  off-task  are  then 
calculated.  High  inter-observer  agreement  has  been  demonstrated  (McNeil  et  al., 
1991;  Rigelhaupt,  et  al.,  1994).  In  addition,  Rigelhaupt  et  al.  (1994)  examined 
concurrent  validity  using  the  SESBI  and  found  that  the  Problem  Scale  scores 
correlated  significantly  with  the  noncomply  category  and  that  the  Intensity  Scale  scores 
correlated  significantly  with  all  three  categories.  As  is  typically  found  when  using 
different  methods  (rating  scale,  observation)  and  different  informants  (teachers, 
observers),  the  correlations  were  generally  modest  (.23  to  .34).  It  should  be  noted 
that  this  study  was  conducted  with  children  ages  3-6  and  often  with  unstructured 
classrooms.  It  will  be  important  to  see  if  more  structured  classrooms  improve 
correlations  between  REdSOCS  and  teacher  rating  scales. 

Relationship  Between  Behavioral  Observations  and  Teacher  Ratings 

Kazdin,  Esveldt-Dawson,  and  Loar  (1983)  examined  the  relationship  between 
direct  observation  and  teacher  ratings  with  32  psychiatric  inpatients  ranging  in  age 
from  7  to  13.  The  informants  on  the  rating  scales  were  4  teachers,  and  each  child 
was  observed  for  a  total  of  15  minutes.  Children  in  tiieir  study  were  rated  by  their 
teachers  as  being  more  on-task  and  less  disruptive  than  observational  data  suggested. 


In  a  meta-analytic  study,  Achenbach,  McConaughy,  and  Howell  (1987) 
examined  the  correlations  found  among  different  informants  including  parents, 
teachers,  observers,  and  self-ratings.  They  found  that  the  mean  correlation  between 
teachers  and  observers  (.42)  was  the  highest  correlation  between  different  informants. 
Rather  than  interpret  the  low  correlations  as  evidence  of  poor  reliability,  the  authors 
suggested  that  the  different  informants  experience  children  differently  and  that  each 
informant  adds  unique  information.  These  findings  support  the  need  for  multisource 
assessment  of  children.  Other  factors  that  have  been  found  to  influence  correlations 
between  direct  observation  and  teacher  report  are  age  and  problem  type.  Measures  of 
younger  children  are  more  likely  than  measures  of  older  children  or  adults  to  show 
greater  agreement  across  informants,  and  externalizing  problems  yield  significantly 
greater  agreement  than  internalizing  problems  (Achenbach,  McConaughy,  &  Howell, 
1987). 

Achenbach,  Mcconaughy,  and  Howell  (1987)  also  reviewed  the  agreement 
between  self-ratings  and  others'  ratings  on  the  same  trait.  They  found  the  mean 
correlation  to  be  quite  low  (.22)  and  even  lower  with  teacher  rating  and  student  self- 
rating  (.20).  Achenbach  and  colleagues  (1987)  suggest  that  children  above  the  age  of 
10  can  be  important  sources  of  information  on  their  own  behavior  and  moods,  but 
younger  children  generally  are  not  reliable  informants  in  regard  to  their  behavior. 

Sutter-Eyberg  Student  Behavior  Inventory 

The  Sutter-Eyberg  Student  Behavior  Inventory  (SESBI)  (Eyberg,  in  press)  was 
designed  to  be  a  unidimensional  measure  of  disruptive  school  behavior  in  children 


r 

18 

between  the  ages  of  2  and  16.  The  SESBI  contains  36  disruptive  behavior  items  that 
are  observable  by  teachers.  Each  behavior  is  rated  on  two  dimensions,  its  frequency 
of  occurrence  and  its  identification  as  a  problem.  Teachers  rate  how  often  the 
problem  occurs  on  a  7-  point  Likert  scale  ranging  from  (1)  "never"  to  (7)  "always." 
The  item  ratings  are  summed  to  yield  the  intensity  score,  with  a  potential  range  of  36 
to  252.  Teachers  also  indicate  whether  the  behavior  is  currently  a  problem  on  a  yes 
no  scale,  yielding  a  problem  score  which  is  the  sum  of  the  yes  responses,  with  a 
possible  range  of  0  to  36. 

The  goals  of  this  study  are  to  examine  the  validity  of  the  SESBI  with  grade  school 
children,  establish  cutoff  scores  for  the  SESBI  scales,  and  to  further  examine  the 
adequacy  of  the  items  for  fully  capturing  the  range  of  disruptive  behavior,  potentially 
creating  a  revision  of  the  scale.  In  order  to  place  the  goals  of  this  study  in  context,  a 
review  of  the  development  and  psychometric  characteristics  of  the  SESBI  follows 
below. 

Previous  Item  Analyses 
The  original  psychometric  study  of  the  SESBI  (Funderburk  &  Eyberg,  1989)  was 
conducted  on  55  non-referred  lower-middle  SES  preschoolers.    In  that  study,  the 
mean  frequency  of  occurrence  ratings  for  the  36  items  of  the  SESBI  ranged  from  1.3 
to  3.6  on  the  7-point  Intensity  Scale.  These  scores  indicated  that  the  behaviors 
occurred  "seldom"  to  "sometimes"  on  the  average.  The  standard  deviations  of  the 
behavior  frequency  ratings  ranged  from  .9  to  2.2.  The  test-retest  stability  coefficients 


19 

for  the  items  on  the  Intensity  Scale  ranged  from  .49  to  .88,  all  of  which  were 
significant     <  .001). 

In  a  sample  of  1116  children  aged  5  to  11  for  which  SES  was  not  reported,  Burns 
and  Owen  (1990)  found  significant  gender  effects  on  the  SESBI  Intensity  Scale  for  28 
of  the  36  behaviors  and  therefore  conducted  item  analyses  by  gender.  The  mean 
frequency  ratings  of  the  items  for  males  ranged  from  1.08  for  "hits  teacher"  to  3.23 
for  "dawdles  in  obeying  rules  or  instructions."  For  females  the  range  of  frequency 
rating  for  the  items  was  from  1.02  for  "hits  teacher"  to  2.47  for  "dawdles  in  obeying 
rules  and  instructions. "  The  average  frequency  ratings  for  males  and  females  were 
2.08  (SD  =  1.11)  and  1.64  (SD  =  0.76),  respectively.  Item-to-total  correlations 
between  the  frequency  ratings  and  the  total  intensity  score  ranged  from  a  low  of  .30, 
for  "steals",  p  <  .05,  to  a  high  of  .89  for  "dawdles  in  obeying  rules  and 
instructions,"  p  <  .0001. 

Rayfield  and  Eyberg  (1994)  examined  the  SESBI  items  in  a  sample  of  720 
rural  middle  and  high  school  students.    The  mean  frequency  of  occurrence  ratings  for 
the  36  items  of  the  SESBI  ranged  from  1.04  to  2.33  on  the  7-point  Intensity  Scale. 
The  standard  deviations  of  the  behavior  frequency  ratings  ranged  from  .36  to  1.81. 
The  range  of  the  scores  on  each  item  was  1  to  7,  with  the  exception  of  "hits  teacher" 
and  "cries." 

Funderburk  and  Eyberg  (1989)  also  examined  the  Problem  Scale  items  of  the 
SESBI.  Each  item  was  endorsed  as  a  problem  by  1.8%  to  27.3%  of  the  respondents 
in  this  preschool  age  sample.  Point  biserial  item-to-total  correlations  for  the  Problem 


20 

Scale  were  significant  for  35  of  the  36  items,  and  ranged  from  .06,  n.s.,  for  "steals" 
to  .84,  p  <  .0001  for  "gets  angry  when  doesn't  get  his  or  her  own  way."  Rayfield 
(1993)  found  that  each  item  was  endorsed  as  a  problem  between  .1%  (hits  teacher) 
and  15.8%  (fails  to  finish  tasks  or  projects)  of  the  time. 

Rayfield  (1993)  also  examined  17  experimental  items  that  were  rationally 
derived  from  the  DSM-III-R's  Disruptive  Behavior  Categories.  Based  on  frequency 
of  occurrence,  teacher  endorsement  as  a  problem,  and  item-to-total  correlations, 
Rayfield  (1993)  recommended  that  several  original  items  be  deleted  from  the  scale  and 
replaced  with  items  from  the  experimental  list.  Each  of  the  items  recommended  for 
deletion  was  rated  as  occurring  "never"  more  than  80%  of  the  time  and  as  being  a 
problem  less  than  5%  of  the  time.  This  study  will  further  examine  these  items  in  a 
clinic  population. 

Previous  Scale  Analyses 

Scale  analyses  from  the  Funderburk  and  Eyberg  (1989)  study  found  die  Intensity 
Scale  scores  ranged  from  a  low  of  36  to  a  high  of  228,  with  a  mean  intensity  score  of 
100.9,  and  a  standard  deviation  of  47.6.  The  distribution  of  the  intensity  scores  was 
approximately  normal.  The  problem  scores  ranged  from  a  low  of  0  to  a  high  of  33, 
with  a  mean  of  6  and  a  standard  deviation  of  8.8.  The  distribution  of  the  problem 
scores  was  reported  to  be  somewhat  positively  skewed. 

Scale  analyses  from  the  Bums  and  Owen  (1990)  study,  conducted  on  5-  to  11- 
year-old  children  from  a  rural  community,  found  the  mean  intensity  score  to  be  67.4 
with  a  standard  deviation  of  35.4,  which  appears  substantially  lower  than  the  mean  of 


21 

100.9  found  for  the  younger  urban  sample  in  the  Funderburk  and  Eyberg  study.  The 
mean  problem  score  was  found  to  be  4.4  with  a  standard  deviation  of  7.5.  Because 
the  children  whose  teachers  indicated  that  the  children  had  learning  disabilities  or 
behavior  problems  were  not  included  in  the  Bums  and  Owen  (1990)  "normative 
sample,"  the  lower  mean  scores  are  likely  accounted  for  by  exclusion  of  all  children 
suspected  of  having  behavior  problems.  However,  because  the  age  range,  the  urban 
versus  rural  status,  and  possibly  SES  in  these  two  samples  also  differed,  it  is  not 
possible  to  know  the  source  of  the  apparent  score  differences  from  this  study. 

In  a  third  study  with  88  non-referred  high-SES  preschoolers  (Funderburk,  Eyberg, 
&  Behar,  1989),  the  mean  intensity  score  for  the  SESBI  was  53  with  a  standard 
deviation  of  23.9,  and  ranged  from  36  to  157.  The  problem  score  had  a  mean  of  .8 
with  a  standard  deviation  of  3.5,  and  ranged  from  0  to  25.  Contrary  to  what  the 
researchers  hypothesized,  the  high-SES  preschoolers  showed  significantly  lower  SESBI 
scores  than  the  lower-middle-SES  children  in  the  original  sample  (p  <  .01).  Thus,  it 
seemed  that  either  teachers  had  a  tendency  to  rate  children  of  high  SES  lower  than 
lower-middle-SES  children  on  the  SESBI,  or  that  children  from  high-SES  backgrounds 
did  not  display  as  many  disruptive  behaviors  at  school  as  same-age  lower-middle-SES 
children. 

Finally  Rayfield  and  Eyberg  (1994)  found  the  mean  score  of  the  SESBI  with  a 
large  inclusive  sample  of  older  rural  children  to  be  57.9  with  a  standard  deviation  of 
32.6.  Intensity  scores  ranged  from  36  to  229  and  problem  scores  ranged  from  0  to 
36.  The  mean  problem  score  was  2.5  with  a  standard  deviation  of  5.9 


22 

In  sum,  mean  scores  of  both  the  Intensity  Scale  and  the  Problem  Scale  have  varied 
widely  across  samples.  There  are  several  possible  explanations  for  these  differences 
including  age  of  the  children,  urban  versus  rural  status,  and  SES.  Because  of  the 
wide  range  of  mean  SESBI  scores  no  cutoff  score  has  been  established.  One  goal  of 
this  study  was  to  examine  both  a  clinical  and  normative  sample  of  SESBI  scores  in 
order  to  establish  a  cutoff  score. 

Relationship  between  the  Intensity  and  Problem  Scales 
Funderburk  and  Eyberg  (1989)  found  a  correlation  between  the  Intensity  Scale 
and  the  Problem  Scale  of  .65  indicating  that  the  two  scales  measure  similar  but  not 
identical  dimensions  of  behavior.  Funderburk  et  al.  (1989)  also  found  that  the  two 
scales  of  the  SESBI  had  a  moderate  correlation  of  .57.  With  a  sample  of  older 
children,  Rayfield  and  Eyberg  (1994)  found  a  similar  moderate  correlation  between 
the  two  scales  of  the  SESBI  (.60).  In  contrast,  Bums  and  Owen  (1990)  found  a 
correlation  between  the  Intensity  and  Problem  Scales  of  .84.  From  their  results, 
Bums  and  Owen  questioned  whether  the  two  scales  provide  useful  separate  sources  of 
information.  In  addition,  Burns  and  Owen  suggested  that  the  Problem  Scale  alone 
would  be  sufficient  for  a  screening  device.  It  is  probable,  however,  that  on  the  SESBI 
the  Problem  Scale  is  the  more  subjective  scale  that  taps,  to  a  greater  extent  than  the 
Intensity  Scale,  the  teacher's  personal  distress,  tolerance  for  misbehavior,  or 
defensiveness.  Kazdin,  Mazurick,  and  Bass  (1990)  suggested  that  two  measures  are 
redundant  only  if  their  correlation  is  .85  or  greater. 


23 

It  is  also  important  to  recall  that  the  Burns  and  Owen  sample  excluded  children  with 
behavior  problems  and  learning  problems,  and  thus  their  data  may  not  provide 
information  relevant  to  screening  for  clinically  significant  behavior  problems. 

Reliability 

In  the  original  standardization  study  (Funderburk  &  Eyberg,  1989)  the  SESBI 
Intensity  Scale  had  a  mean  item-to-total  correlation  of  .72,  p  <  .001,  and  the  internal 
consistency  of  the  Intensity  Scale  using  Cronbach's  alpha  was  .98.  The  Problem 
Scale  had  a  mean  item-to-total  correlation  of  .63  and  an  internal  consistency 
coefficient  of  .96.  Other  studies  have  found  similarly  high  internal  consistency  for 
both  the  Intensity  and  Problem  Scales  of  the  SESBI  with  both  older  children 
(Rayfield,  &  Eyberg,  1993)  and  clinic  referred  children  (Schaughency,  Hurley,  Yano, 
Seeley,  &  Talarico,  1989). 

It  has  been  demonstrated  that  many  instruments  have  a  tendency  to  yield  lower 
scores  on  second  administration,  even  after  brief  intervals,  and  such  systematic 
decreases  can  be  misinterpreted  as  improved  behavior  or  therapeutic  effects  of  an 
intervention  (Barkley,  1988).  On  the  SESBI,  test-retest  correlations  have  ranged  from 
.90  to  .94  for  the  Intensity  Scale  and  .89  to  .98  for  the  Problem  Scale,  (Funderburk  & 
Eyberg,  1989;  Funderburk  et  al.,  1989;  Udish,  Sosna,  Warner,  &  Bums,  1989).  In 
addition,  Funderburk  and  Eyberg  (1989)  found  that  neither  the  SESBI  intensity  or 
problem  scores  decreased  on  second  administration  after  a  one- week  interval.  In 
contrast,  Ladish  and  her  colleagues  (1989)  did  find  a  significant  decrease  (p  <  .002) 
in  the  intensity,  but  not  problem,  scores  on  second  administration  after  a  three-week 


24 

interval.  The  average  decrease  in  tlie  intensity  score  was  10  points  which  represented 
about  1/4  standard  deviation  in  their  sample.  It  should  be  noted  that  the  time  intervals 
from  the  two  studies  are  different  and  could  be  responsible  for  the  different  results. 
Thus  it  was  important  to  cross-validate  stability  across  time  periods  longer  than  one 
week,  because  a  decrease  on  second  administration  would  be  important  information 
for  clinical  use  of  the  SESBI.    The  current  study  obtained  test-retest  information  over 
a  longer  period  of  time  (6  weeks)  to  examine  the  extent  to  which  SESBI  scores 
decrease  over  time. 

Funderburk  and  Eyberg  (1989)  found  that  the  inter-rater  reliabilities  across 
seven  pairs  of  teachers,  rating  from  4  to  10  children  each,  ranged  from  .60  to  .97 
yielding  a  weighted  mean  correlation  of  .85  for  the  Intensity  Scale.  For  the  Problem 
Scale,  the  inter-rater  reliabilities  ranged  from  .64  to  .94  with  a  weighted  mean  of  .87. 
Ladish  et  al.  (1989)  found  similar  results  in  a  non-referred  sample  of  60  preschoolers; 
for  the  Intensity  Scale,  the  correlation  was  .86  and  for  the  Problem  Scale,  the 
correlation  was  .84.  Finally,  Dumas  (1992)  found  that  mean  intensity  and  problem 
scores  for  144  preschoolers  were  consistent  across  teachers.  These  results  indicate 
that  when  two  teachers  independently  complete  the  SESBI  on  the  same  child,  there  is 
a  high  level  of  agreement,  at  least  with  preschool-age  children. 

Validity 

The  SESBI  has  demonstrated  concurrent  validity  with  several  teacher  rating 
measures  as  well  as  sociometric  measures  and  direct  observation.  Funderburk  and 
Eyberg  (1989)  examined  concurrent  validity  by  having  preschool  teachers  complete 


25 

both  the  SESBI  and  the  Preschool  Behavior  Questionnaire  (PBQ)  (Behar  &  Stringfield, 
1974).  Correlations  between  the  PBQ  total  score  and  both  the  intensity  score,  r  = 
.76,  and  problem  score,  r  =  .61,  of  the  SESBI  were  highly  significant,  ps  <  .0001. 
The  SESBI  has  also  been  compared  to  the  Conners  TRS  to  examine  concurrent 
validity  (Ladish  et  al.,  1989;  Newcomb,  Eyberg,  Bodiford,  Eisenstadt,  &  Funderburk, 
1989).  Newcomb  et  al.  found  a  correlation  of  .93  (n=  42),  jj  <  .0001,  between  the 
Conners  TRS  total  score  and  the  SESBI  Intensity  Scale.  Other  investigators  have 
found  similar  results.  Dumas  (1992)  compared  the  SESBI  to  the  Preschool  Socio- 
Affective  Profile  (PSP-S)  (Freniere,  Dumas,  Capuanto,  &  Dubeau,  1992)  in  a  sample 
of  144  non-referred  preschoolers.  Results  indicated  that  both  the  Intensity  and 
Problem  Scales  correlate  significantly  with  the  Anger-Aggression  subscale  of  the  PSP- 
S,  r  =  .79,  p  <  .001  and  r  =  .54,  p  <  .001,  respectively.  In  addition,  both  scales 
of  the  SESBI  are  moderately  correlated  with  the  Social  Competency  subscale  of  the 
PSP-S,  r  =  -.62,  p  <  .001,  for  the  Intensity  Scale  and  r  =  -.50,  p  <  .001,  for  the 
Problem  Scale.  Finally,  both  scales  of  the  SESBI  are  also  moderately  correlated  with 
the  Anxiety- Withdrawal  subscale  of  the  PSP-S,  i  =  .43,  p  <  .001,  and  r  =  .21,  p 
<  .01,  respectively. 

Several  studies  have  also  examined  the  discriminant  validity  of  the  SESBI.  The 
SESBI  has  been  compared  with  the  CBCL-TRF  to  assess  discriminant  validity  in 
samples  of  clinic-referred  children  age  5  to  14  (Schaughency  et  al.,  1989).  In  contrast 
to  the  high  correlations  between  the  externalizing  score  on  the  TRF  and  both  the 
Intensity,  r(43)  =  .87,  p  <  .001,  and  the  Problem  Scales  r(42)  =  .71,  p  <  .001,  of 


26 

the  SESBI,  the  internalizing  score  on  the  TRF  was  found  to  be  unrelated  to  the 
Problem  Scale,  r(43)  =  .14,  n.s  and  only  mildly  related  to  the  Intensity  Scale,  r(42) 
=  .25,  p  <  .05,  of  the  SESBI.    Results  also  indicated  that  the  children  with 
diagnoses  of  externalizing  disorders  obtained  higher  SESBI  intensity,  F(2,  105)  = 
14.62,  p  <  .001,  and  problem,  F(2,  101)  =  7.98,  p  <  .001,  scores  than  children  in 
either  an  internalizing  disorders  sample  or  a  sample  of  clinic-referred  children 
receiving  no  diagnosis  (Schaughency  et  al.,  1989).  Finally,  a  study  conducted  by 
Newcomb  et  al.  (1989)  further  examined  discriminant  validity  by  comparing  SESBI 
scores  of  preschool  children  referred  for  treatment  with  SESBI  scores  from  two  non- 
referred  contrast  samples  of  children  drawn  from  the  same  classrooms  as  each  of  the 
referred  children.  The  first  contrast  sample  consisted  of  children  described  by  the 
school  director  as  presenting  behavior  problems  for  the  teacher.  The  second  contrast 
sample  consisted  of  children  described  as  presenting  average  classroom  behavior.  The 
children  in  the  treatment  group  were  diagnosed  with  Oppositional  Defiant  Disorder, 
Attention  Deficit  Hyperactivity  Disorder,  Conduct  Disorder,  or  some  combination  of 
these  three.  The  results  of  the  study  showed  that  the  SESBI  was  not  only  able  to 
discriminate  between  normal  children  and  those  with  behavior  problems  but  also  both 
scales  were  able  to  distinguish  children  with  severe  enough  behavior  problems  to  be 
referred  for  treatment  from  non-referred  children  described  as  presenting  behavior 
problems  in  the  classroom. 

The  data  from  the  Funderburk  and  Eyberg  (1989)  study  were  also  analyzed  for 
relationships  between  group  membership  in  the  non-referred  sample,  a  clinic  sample 


27 

of  children  referred  for  learning  or  developmental  problems,  and  a  clinic-referred 
behavior  problem  sample.  There  was  a  significant  effect  for  group  membership  for 
both  the  Intensity  Scale  F(2,  80)  =  10.85,  p  <  .0001,  and  the  Problem  Scale, 
F(2,80)  =  16.91,  p  <  .0001.  Post-  hoc  analyses  showed  that  the  sample  referred  for 
learning  or  developmental  problems  did  not  differ  from  the  non-referred  sample, 
whereas  the  sample  referred  for  behavior  problems  had  significantly  higher  SESBI 
scores  than  both  the  non-referred  sample  and  the  sample  referred  for  developmental 
problems. 

Burns  and  Owen  (1990)  compared  a  normal  sample  of  children  with  a  sample  of 
learning  disabled  children,  a  sample  of  children  receiving  counseling  for  behavior 
problems,  and  a  sample  of  children  receiving  services  for  reading  difficulties,  from  the 
same  school  district.  The  group  that  was  in  counseling  for  behavior  problems  had 
significantly  higher  intensity  and  problem  scores  than  the  other  groups.  The  learning 
disabled  group  had  significantly  higher  intensity  and  problem  scores  than  the  normal 
group  and  higher  intensity  scores  than  the  reading  difficulty  group. 

Rayfield  and  Eyberg  (1994)  found  that  children  who  were  diagnosed  with  learning 
disabilities  obtained  higher  intensity  scores,  but  not  problem  scores,  than  their  non- 
learning  disabled  peers.  This  was  interpreted  to  mean  that  the  children  with  LD 
showed  behavior  problems  more  frequently  than  their  non-learning  disabled  peers  but 
that  the  teachers'  viewed  themselves  as  able  to  handle  these  situations.  In  contrast, 
mentally  retarded  children  were  not  rated  as  having  significantly  different  rates  of 
behavior  problems  than  their  non-retarded  peers.  The  increase  in  behavior  problems 


28 

in  children  with  learning  disabilities  is  consistent  with  the  DSM-III-R  and  other 
findings  (Toro,  Weissber,  Guare,  &  Liebenstein,  1990)  that  children  diagnosed  with 
learning  disabilities  are  more  likely  than  children  without  learning  problems  to  have 
behavior  problems. 

The  SESBI  has  also  been  studied  as  a  treatment  outcome  measure.  McNeil  et  al. 
(1991)  studied  three  samples.  The  first  was  a  treatment  sample,  the  second  was  a 
normal  control  sample,  and  the  third  was  an  untreated  deviant  classroom  control 
sample.  Results  indicated  that  the  SESBI  was  sensitive  to  the  improvements  of  the 
treated  children.  In  the  treatment  sample,  the  mean  SESBI  intensity  score  dropped 
from  155  to  116  whereas  scores  for  the  children  in  the  untreated  control  groups 
showed  no  significant  change.  The  authors  concluded  that  the  SESBI  is  a  sensitive 
instrument  for  use  in  evaluating  change  in  behavior  problems  resulting  from  treatment. 

Factor  Analyses 

Burns  and  Owen  (1990)  conducted  principle  components  factor  analyses  on  the 
data  from  their  normal  sample  of  5-  to  11 -year-olds.  For  the  Intensity  Scale  the  first 
unrotated  factor  accounted  for  51.3%  of  the  variance  and  the  second,  third,  and  fourth 
unrotated  factors  accounted  for  7.0%,  4.6%,  and  3.9%  of  the  variance,  respectively. 
All  items  loaded  positively  on  the  first  factor,  and  all  items  had  loadings  greater  than 
.30  on  the  first  factor  with  the  exception  of  the  item,  "hits  teacher"  which  had  an  item 
loading  of  .25.  The  Problem  Scale  showed  the  same  pattern,  with  the  first  unrotated 
factor  accounting  for  43.8%  of  the  variance  and  the  second,  third,  and  fourth 
unrotated  factors  accounting  for  6.9%,  4.3%,  and  3.9%  of  the  variance,  respectively. 


29 

Again  all  of  the  items  with  the  exception  of  the  "hits  teacher"  item  loaded  positively 
on  the  first  factor  with  at  least  a  loading  of  .30.  The  "hits  teacher"  item  had  a 
loading  of  .25.  A  four  factor  solution  resulted  in  only  3  items  loading  onto  the  fourth 
factor,  leading  the  researchers  to  conclude  that  a  three  factor  solution  seemed  most 
^propriate.  The  authors  suggested  that  factor  1  represents  overt  aggression  toward 
others.  Factor  2  was  suggested  to  represent  oppositional  behavior,  and  Factor  3  was 
judged  representative  of  attentional  difficulties. 

Funderburk  et  al.  (1989)  combined  the  data  from  the  original  study  of  55  non- 
referred  lower-middle-SES  preschoolers  (Funderburk  &  Eyberg,  1989)  with  the  data 
from  a  sample  of  88  high-SES  preschoolers  to  conduct  factor  analyses  of  the  SESBI 
Intensity  and  Problem  Scales.  Because  of  the  differences  on  the  demographic 
variables,  effects  related  to  location  were  partialled  out  of  the  factor  analyses.  Similar 
to  the  results  from  the  Bums  and  Owen  (1990)  study,  they  found  that  the  first  factor 
accounted  for  53%  of  the  variance  in  the  Intensity  Scale  and  all  of  the  36  items  had  a 
positive  loading  of  .36  or  greater  on  this  factor.  For  the  Problem  Scale,  45%  of  the 
variance  was  accounted  for  by  the  first  factor  and  all  but  one  of  the  items  ("steals") 
had  a  positive  loading  of  .30  or  greater.  No  other  factor  accounted  for  more  than 
10%  of  the  variance  for  either  scale.  In  contrast  to  the  conclusions  by  Burns  and 
Owen,  however,  Funderburk  et  al.  suggested  that  the  additional  factors  accounting  for 
less  than  10%  of  the  variance  did  not  significantly  enhance  the  utility  of  the  SESBI, 
and  they  therefore  recommended  a  one  factor  solution. 


30 

Interestingly,  a  similar  conclusion  was  suggested  by  Ladish  and  colleagues  (Ladish 
et  al.,  1989)  following  factor  analyses  of  the  SESBI  with  60  non-referred  preschool 
children.  In  the  latter  study,  in  which  the  first  factor  accounted  for  61  %  and  47%  of 
the  variance  in  the  Intensity  and  Problem  Scales,  respectively,  the  authors  noted  that 
the  low  variance  accounted  for  by  additional  factors  (i.e.,  <  10%)  indicated  that  the 
SESBI  is  a  unidimensional  measure  of  conduct  problem  behaviors  (Ladish  et  al., 
1989). 

Factor  analyses  of  the  SESBI  have  also  been  conducted  by  Schaughency  and 
colleagues  (1989)  using  data  from  113  clinic-referred  children  aged  5  to  14.  On  the 
Intensity  Scale,  the  first  factor  accounted  for  38%  of  the  variance,  which  is  somewhat 
less  than  previously  reported  from  samples  of  non-referred  children.  These  authors 
suggested  that  this  factor  taps  into  conduct  problem  behavior.  Their  second  factor 
accounted  for  9%  of  the  variance  and  was  suggested  to  tap  negative  emotionality. 
Finally  a  third  factor,  which  accounted  for  8  %  of  the  variance,  was  described  as 
tapping  attentional  difficulties  of  the  child.  For  the  Problem  Scale  they  found  the  first 
factor  accounted  for  27%  of  the  variance,  but  it  was  comprised  of  items  they  referred 
to  as  negative  emotionality.  The  second  factor  accounted  for  8%  of  the  variance  and 
was  suggested  to  tap  conduct  problem  behavior.  Finally  a  third  factor  accounting  for 
7%  of  the  variance,  was  again  suggestive  of  attentional  difficulties. 

The  results  of  the  factor  analyses  reported  by  Schaughency  et  al.  (1989)  are 
somewhat  different  than  those  found  in  other  studies.  Not  only  did  the  first  factor  of 
both  the  Intensity  and  Problem  Scales  account  for  less  variance  than  in  other  studies. 


but  also  the  first  unrotated  factor  of  each  scale  of  the  SESBI  was  different  in  these 
analyses.  One  explanation  for  this  difference  may  be  that  Schaughency  et  al. 
examined  a  clinic-referred  population  as  opposed  to  the  non-referred  samples  in  other 
studies. 

Rayfield  (1993)  conducted  a  principal  components  analysis  on  the  original  SESBI 
scales  with  a  large  diverse  sample  of  nonreferred  students  and  concluded  that  a  two 
factor  solution  was  the  most  meaningful.  The  first  unrotated  factor,  was  large 
accounting  for  57%  of  the  variance,  and  contained  items  related  to  oppositionality. 
The  second  factor  was  much  smaller,  accounting  for  6%  of  the  variance.  It  was  made 
up  of  items  related  to  attention.  Similar  results  were  found  with  the  problem  scale. 
Rayfield  (1993)  also  conducted  a  principal  components  analysis  after  adding  17 
experimental  items  to  the  SESBI.  The  items  were  designed  specifically  to  enhance  the 
attention  and  aggression  factors  suggested  by  Bums  and  Owen  (1990).  The  results 
suggested  that  although  the  items  slightly  increased  the  variance  accounted  for  by  the 
attention  factor  (7.3%  of  the  variance),  the  oppositional  factor  remained  strong 
(62.3%  of  the  variance)  and  the  aggression  items  did  not  form  an  interpretable  factor. 

Previous  factor  analyses  with  the  SESBI  (Rayfield,  1993)  have  found  that  the 
motor  activity  items  do  not  load  on  the  same  factor  as  the  items  tapping  attentional 
problems.  The  hyperactivity  items  load  with  other  items  that  are  thought  to  be  more 
representative  of  oppositionality.  This  factor  pattern  is  consistent  with  the  DSM-IV 


32 

(1994)  which  created  subtypes  separating  hyperactivity  from  the  more  inattentive 
features  of  ADHD. 

Gender.  Age,  and  Ethnicity  Effects 

In  the  initial  Funderburk  and  Eyberg  (1989)  study  of  preschoolers,  no  significant 
gender  effects  were  obtained  for  either  the  Intensity  or  the  Problem  Scale  of  the 
SESBI.  However  other  studies  (Funderburk  et  al.,  1989;  Ladish  et  al.,  1989, 
Schaughency  et  al.,  1989)  have  found  gender  effects  on  one  or  both  scales.  Burns  and 
Owen  (1990),  who  have  studied  the  largest  sample  to  date,  found  significant  gender 
effects  on  both  scales.  In  further  analyses  they  used  the  epsilon  squared  statistic  to 
estimate  amount  of  variance  accounted  for  by  gender.  Their  results  indicated  that 
gender  accounted  for  5%  and  4%  of  the  Intensity  and  Problem  Scale  variance, 
respectively.  Rayfield  and  Eyberg  (1993)  also  found  significant  effects  of  gender  with 
middle  school  and  high  school  children  on  both  scales  of  the  SESBI  with  boys  being 
rated  with  more  behavior  problems  than  girls.  It  may  be  the  case  that,  in  the 
preschool  years,  boys'  and  girls'  behavior  is  more  similar  and  becomes  more  disparate 
as  children  get  older. 

Many  of  the  studies  with  the  SESBI  have  not  found  age  effects  (e.g..  Bums  & 
Owen,  1990;  Funderburk  &  Eyberg,  1989;  Udish  et  al.,  1989;  Funderburk  et  al., 
1989).  Most  of  the  studies  examined  age  within  a  restricted  age  range  of  preschool 
children.  Bums  and  Owen  (1990)  studied  a  broader  age  range  (ages  5  to  11),  but  even 
within  this  broader  range,  no  age  effects  were  detected.  In  addition,  Schaughency  et 
al.  (1989),  with  a  clinic-referred  sample  of  children  aged  5  to  14,  found  there  were  no 


33 

significant  age  effects.  In  contrast,  Rayfield  and  Eyberg  (1993)  found  significant 
differences  in  age,  with  older  children  (9th  -  12th  grades)  being  rated  with  fewer 
behavior  problems  than  younger  children  (5th  -  8th  grades).  The  correlations  between 
grade  and  SESBI  scores,  however,  were  low  r  =  .06  for  the  Problem  Scale;  r  =  .12 
for  the  Intensity  Scale.  The  changes  in  behavior,  therefore,  do  not  appear  to  be  in  a 
linear  function  of  grade. 

The  demographic  variable  of  ethnicity  has  received  the  least  attention  in  all  studies 
of  the  SESBI.  Rayfield  and  Eyberg  (1994)  examined  the  effects  of  race  on  SESBI 
scores  with  a  large  sample  of  African  American  and  Caucasian  children.  An 
interaction  was  found  showing  higher  SESBI  scores  for  African  American  than 
Caucasian  children  at  younger  ages.  Although  SES  data  were  not  directly  available, 
school  records  showed  that  69%  of  African  American  children  were  receiving  free  or 
reduced  lunch  in  contrast  to  5  %  of  Caucasian  children  receiving  free  or  reduced 
lunch.  In  light  of  this  information,  the  effect  of  race  found  was  viewed  as  being  a 
marker  for  SES.  These  results  are  consistent  with  those  of  Offord  and  Racine  (1989) 
who  found  that  teacher  rating  of  conduct  disorder  were  associated  with  low  income, 
especially  with  younger  children.    Rayfield  and  Eyberg  (1994)  also  examined  whether 
the  race  of  the  teacher  had  differential  effects  on  SESBI  scores  for  Caucasian  and 
African  American  students.  The  results  showed  that  teacher  race  did  not  effect  SESBI 
scores. 


Purposes  of  the  Present  Study 
One  purpose  of  the  present  study  was  to  compare  the  SESBI  and  SESBI-R  on  a 
wide  array  of  reliability  and  validity  indices  in  an  effort  to  make  a  reccomendation  as 
to  which  form  of  the  measure  should  be  used.    In  addition,  the  ratings  from  the  two 
groups  of  children  (those  in  ESE  classes  and  those  in  regular  classes)  were  be  used  to 
determine  cut  off  scores  to  be  used  with  the  SESBI  and  SESBI-R  to  help  judge  the 
clinical  signficance  of  behavior  problems  displayed  in  the  classroom. 


34 


METHOD 
Participants 

Fifty-two  teachers  from  11  different  schools  completed  rating  scales  on  415 
elementary  school  age  children  in  Alachua  County.  Among  the  52  teachers,  8  were 
African  American  and  the  remainder  were  Caucasian.  There  were  3  male  teachers  in 
the  sample.  There  were  8  kindergarten  teachers,  7  first  grade  teachers,  9  second 
grade  teachers,  8  third  grade  teachers,  6  fourth  grade  teachers,  and  8  fifth  grade 
teachers.  Six  teachers  taught  classes  containing  students  at  various  grade  levels. 

Among  the  children  rated,  there  were  101  children  in  exceptional  student 
education  classes.  In  order  to  be  placed  in  ESE  classes,  children  must  first  undergo 
an  initial  screening  and  classroom  interventions  must  be  attempted.  A  psychological 
evaluation  is  also  required  to  place  a  child  in  ESE  classes.  The  remaining  students 
were  in  regular  classrooms.  The  sample  contains  206  Caucasian  children,  205 
African-American  children,  2  Hispanic  children,  and  2  Bi-racial  children.  There  were 
223  males  and  192  females  in  the  sample.  There  were  29  children  whose  teacher 
indicated  they  had  been  diagnosed  with  a  learning  diability  and  7  children  whose 
teacher  indicated  that  the  child  was  mentally  retarded. 

To  maintain  confidentiality,  participant  demographic  data  could  be  obtained 
only  from  information  readily  known  to  the  teacher  (e.g.  race,  grade,  and  sex)  and 
school  records  could  not  be  used.  Thus  data  on  socioeconomic  status  was  not 

35 


available.  However,  because  the  sample  was  drawn  from  eleven  different  schools 
spread  across  the  city  of  Gainesville,  socioeconomic  information  about  the  city  was 
obtained  to  provide  an  approximate  description  of  the  sample.  According  to  1990 
census  data,  the  population  is  84,770.  The  mean  and  median  annual  incomes  were 
$29,844  and  $21,077  respectively,  with  15.7%  of  the  population  being  below  poverty 
level. 

Measures 

The  SESBI  and  SESBI-R  were  both  within  one  53  item  measure  given  to  the 
teachers.  The  measure  was  scored  to  yeild  Intensity  and  Problem  Scale  Scores  for 
both  versions  of  die  SESBI.  In  addition  to  the  SESBI  and  REdSocs,  the  Child 
Behavior  Checklist-Teacher  Report  Form  (CBCL-TRF)  (Edlebrock  &  Achenbach, 
1984)  was  administered  to  the  teachers.  This  scale  is  another  widely  used  teacher 
rating  scale.  The  TRF  contains  academic  and  adaptive  functioning  scales  as  well  as 
118  specific  behavior  problem  items.  There  are  eight  narrow  band  syndrome  scales 
that  assess  a  range  of  psychopathology  including  anxiety  and  depression,  social 
withdrawal,  obsessive-compulsive,  somatic  complaints,  social  problems,  thought 
problems,  attention  problems,  delinquent  behavior,  and  aggression.  The  scale  also 
contains  two  broad  band  scales,  externalizing  and  internalizing.  The  test-retest 
reliability  has  ranged  from  .82  to  .96  (Achenbach,  1991).  Inter-teacher  reliability  for 
children  referred  for  evaluation  has  been  reported  to  range  from  .30  to  .66  for  the 
total  scale  (Achenbach,  1991).  The  CBCL-TRF  has  also  demonstrated  concurrent  and 


37 

discriminant  validity  (Achenbacli,  1991;  Barkley,  1990;  Kazdin,  Esveldt-Dawson,  & 
Loar,  1983). 

Procedure 

As  required  by  the  Alachua  County  Public  School  system,  an  application  for 
conducting  research  in  the  schools  was  sent  to  the  principals  of  14  elementary  schools 
in  Gainesville.  Of  these  schools,  3  declined  participation.  One  noted  that  several 
other  research  projects  were  ongoing  in  the  school  and  the  remaining  two  did  not  give 
reasons  for  their  decline. 

The  SESBI  forms  and  the  CBCL-TRF  were  given  to  the  teachers  who  agreed 
to  participate  in  the  study.  Each  teacher  was  asked  to  rate  2  Caucasian  females,  2 
Caucasian  males,  2  African  American  females,  and  2  African  American  males  in  their 
classroom.  Teachers  were  asked  to  start  at  the  beginning  of  their  role  book  and  go 
down  the  role  until  each  category  was  filled.  Fifteen  teachers  completed  the  SESBI 
approximately  six  weeks  later  on  the  same  children.  Seven  of  these  teachers  taught 
ESE  classrooms. 

The  questionnaire  also  asked  for  the  child's  role  book  number,  grade,  sex, 
ethnicity,  and  information  regarding  learning  problems  of  the  child  including  learning 
disabilities  and  mental  retardation.  In  appreciation  for  their  participation,  teachers 
were  paid  a  30  dollar  honorarium  for  the  initial  participation  and  an  additional  15 
dollars  if  they  completed  the  SESBIs  again  at  re-test. 

The  REdSOCS  was  used  to  observe  the  classroom  behavior  of  sixty  children  in 
15  different  classrooms.  The  REdSOCS  uses  a  time  sampling  procedure  involving  six 


38 

10-second-long  coding  intervals.  Four  children  were  observed  in  an  alternating 
fashion,  so  that  one  round  of  observation  lasted  approximately  4  minutes  (one  minute 
per  child).  Five  rounds  of  observation,  or  5  minutes,  was  completed  for  each  child 
on  each  of  3  days.  This  procedure  yielded  15  minutes  of  data  per  child.  Reliability 
data  were  collected  on  31  of  the  45  (69%)  classroom  observation  sessions.  The 
observations  were  conducted  during  a  structured  classroom  time  due  to  the  finding  that 
coding  systems  have  been  more  sensitive  to  changes  in  structured  classroom  time  than 
in  unstructured  classroom  time  (Abikoff,  Gittelman-Klein,  and  Klein,  1980;  Jacob, 
O'Leary,  Rosenblad,  1978). 

Coders  were  trained  to  80%  inter-rater  reliability  (percent  agreement)  on  all 
categories  before  data  collection  began.  The  primary  coder  was  a  graduate  student  in 
the  Department  of  Clinical  and  Health  Psychology.  There  were  three  reliability 
coders,  two  undergraduate  students  who  received  course  credit  for  their  participation 
and  another  graduate  student.  The  coders  were  trained  by  graduate  students  currently 
using  REdSOCS.  Coders  were  first  given  a  didactic  session  on  the  coding  system  by 
a  trained  graduate  student.  Coders  then  practiced  coding  taped  classroom  sessions 
with  the  graduate  student  trainer.  Finally  coders  practiced  live  classroom  coding  with 
a  trained  observer.  In  all,  the  coder  received  approximately  10  hours  of  training. 

Hypotheses 

Specific  Hypotheses  of  the  study  for  both  the  SESBI  and  SESBI-R  were  as 
follows: 


1.  Children  in  ESE  classrooms  will  obtain  higher  SESBI  scores  than  same 
age,  race,  and  sex  children  in  regular  classrooms. 

2.  Males  will  be  rated  by  their  teachers  as  having  more  behavior  problems 
than  their  female  peers. 

3.  The  SESBI  scales  will  correlate  positively  with  the  externalizing  scale  of  the 
CBCL-TRF.  The  correlations  between  the  SESBI  scales  and  the  exteralizing 
scale  of  the  CBCL-TRF  will  be  significantly  higher  than  the  correlations 
between  the  SESBI  scales  and  the  internalizng  scale  of  the  CBCL-TRF. 

4.  Within  each  demographic  grouping  from  the  sample,  both  Scales  of  the 
SESBI  will  be  highly  reliable. 

5.  Observed  non-compliance  to  teacher  commands  will  be  associated  with 
higher  SESBI  scores,  particularly  the  Problem  Scale  of  the  SESBI. 

6.  All  three  categories  of  the  REdSOCS  will  correlate  positively  with  Intensity 
Scale  scores. 

7.  The  SESBI  scores  will  remain  stable  in  the  6  weeks  between  test  and  re- 
test. 

8.  Neither  teacher  race  nor  sex  will  effect  SESBI  scores. 

Analyses 

Individual  items  on  the  SESBI  were  first  analyzed  for  homogeneity  with  the  total 
scale  as  measured  by  item-to-total  correlations  on  the  Intensity  Scale  and  Problem 
Scale.  Means  and  standard  deviations  were  calculated  for  each  item.  The  normality 
of  each  item  and  both  the  Intensity  and  Problem  Scales  were  examined.  Cronbach's 


40 

alpha  was  calculated  to  establish  internal  consistency.  The  cut  off  scores  were 
obtained  using  the  means  for  children  in  ESE  classrooms  and  the  means  for  children 
in  regular  classrooms.  The  mid-point  between  the  two  means  was  considered  the  cut 
point.  The  data  were  analyzed  for  effects  of  type  of  classroom,  teacher  race,  child 
race,  child  gender,  and  grade  using  proc  mix,  an  analysis  of  variance  method  in  SAS. 
Cohen's  Kappa  was  used  to  calculate  inter-observer  reliability  with  REdSOCS  data. 
For  all  the  correlational  analyses,  the  data  were  first  weighted  (Caucasian  by  1.70, 
African  American  by  .30)  to  resemble  more  accurately  the  proportion  of  African 
Americans  in  the  population,  and  thus  make  the  results  more  externally  valid. 
Pearson  correlations  were  used  to  compare  REdSOCS  categories  to  the  Intensity  and 
Problem  Scale  scores  of  the  SESBI.  Pearson  correlations  were  also  used  to  compare 
the  Intensity  and  Problem  Scale  scores  to  the  internalizing  and  the  externalizing  raw 
scores  of  the  CBCL-TRF.  As  suggested  in  the  CBCL-TRF  manual,  the  raw  scores 
were  used  to  provide  more  variability  for  correlational  analyses.  All  analyses  were 
performed  on  the  original  36-item  version  of  the  SESBI  and  the  revised  version 
proposed  by  Rayfield  and  Eyberg  (1993).  A  principal  components  analysis  was 
performed  on  the  SESBI-R.  An  oblique  rotation  (oblimin)  was  used  for  rotation, 
specifying  a  two  factor  solution,  to  validate  the  previous  solution  found  by  Rayfield 
(1993).  Cronbach's  alpha  was  calculated  on  the  two  resulting  factors.  Pearson 
correlations  were  used  to  examine  the  relationship  of  the  ADHD  factor  of  the  SESBI 
with  the  off-task  category  of  the  REdSOCS  and  the  Attention  Problems  narrow-band 
syndrome  scale  of  the  CBCL-TRF. 


RESULTS 
Item  Analyses 

Item  Analyses  of  the  SESBI 

Across  all  415  children  in  the  sample,  the  mean  frequency  of  occurrence 
ratings  for  the  36  items  of  the  original  SESBI  ranged  from  1.2  for  "hits  teacher"  to 
3.4  for  "has  difficulty  staying  on  task"  on  the  7-point  scale.  This  indicates  that  the 
behaviors  occurred  "seldom"  to  "sometimes"  on  the  average.  The  standard  deviations 
of  the  behavior  frequency  ratings  ranged  from  .67  to  2.21.  The  range  of  scores  of 
each  item  on  the  Intensity  Scale  was  1  to  7  with  the  exception  of  "hits  teacher"  (range 
1  to  6).  The  items  were  somewhat  positively  skewed  (4.27  -  .26). 

Each  item  was  rated  as  a  problem  between  3%  (hits  teacher)  and  30%  (teases 
or  provokes  other  students)  of  the  time.  The  Problem  Scale  items  were  also  positively 
skewed  (3.29  -  .81) 
Item  Analvses  of  the  SHSBI-R 

Across  the  415  children  in  the  sample,  the  mean  frequency  of  occurrence  ratings 
for  the  38  items  of  the  SESBI-R  ranged  from  2.5  for  "has  difficulty  entering  groups" 
to  3.4  for  "has  difficulty  staying  on  task"  on  the  7-point  scale.  This  indicates  that  the 
behaviors  occurred  "seldom"  to  "sometimes"  on  the  average.  The  standard  deviations 
of  the  behavior  frequency  ratings  ranged  from  1.5  to  2.21.  The  range  of  scores  of 


41 


42 

each  item  on  the  Intensity  Scale  was  1  to  7.  The  items  on  the  Intensity  Scale  were 
positively  skewed  (1.17  -  .26). 

Each  item  was  rated  as  a  problem  between  14%  (has  difficulty  sharing 
materials)  and  35  %  (fails  to  listen  to  instructions)  of  the  time.  The  Problem  Scale 
items  were  also  positively  skewed  (2.05  -  .63). 

Internal  Consistency 

SESBI  Intensity  Scale 

Corrected  item-to-total  correlations  between  the  item  intensity  rating  and  the 
Intensity  Scale  score  ranged  from  .43  for  "cries"  to  .82  for  "does  not  obey  school 
rules  on  his  or  her  own. "  The  mean  item-to-total  correlation  was  .60.  For  the 
complete  sample,  Cronbach's  alpha  on  the  Intensity  scale  was  .98.  Cronbach's  alpha 
was  also  consistently  high  across  grade,  gender,  and  race  (see  Table  1). 
SESBI  Problem  Scale 

Corrected  item-to-total  correlations  between  the  item  problem  rating  and  the 
Problem  Scale  score  ranged  from  .19  for  "hits  teacher"  to  .71  for  "verbally  fights 
with  other  students."  The  mean  item-to-total  correlation  was  .58. 
For  the  complete  sample,  Cronbach's  alpha  on  the  Problem  Scale  was  .95.  Again, 
across  grade  level,  race,  and  gender  alpha  remained  high  (see  Table  1). 


43 


Table  1 


Cronbach's  alphas  for  the  Intensity  and  Problem  Scales  of  the  SESBI 


Intensity  Scale 

Problem  Scale 

Total 

.98 

.95 

ESE  Classes 

.95 

.93 

Regular  Classes 

.98 

.96 

Male 

.97 

.96 

Female 

.97 

.95 

African  American 

.97 

.95 

Caucasian 

.98 

.95 

Kindergarten 

.97 

.93 

First 

.98 

.93 

Second 

.97 

.96 

Third 

.98 

.96 

Fourth 

.97 

.96 

Fifth 

.98 

.97 

SESBI-R  Intensity  Scale 

Corrected  item-to-total  correlations  between  the  item  intensity  rating  and  the 
Intensity  Scale  score  ranged  from  .44  for  "fidgets  or  squirms  in  seat"  to  .83  for 
"impulsive,  acts  before  thinking"  and  "does  not  obey  school  rules  on  his  or  her  own." 
The  mean  item-to-total  correlation  was  .73.  For  the  complete  sample,  Cronbach's 
alpha  on  the  Intensity  scale  was  .98.  Internal  consistency  was  high  across 
demographic  variables  (see  Table  2). 


44 

SESBI-R  Problem  Scale 

Corrected  item-to-total  correlations  between  the  item  problem  rating  and  the 
Problem  Scale  score  ranged  from  .47  for  "whines"  and  "physically  fights  with  other 
students"  to  .73  for  "impulsive,  acts  before  thinking."  The  mean  item-to-total 
correlation  was  .64.  For  the  complete  sample,  Cronbach's  alpha  on  the  Problem 
Scale  was  .96.  Alpha  remained  high  across  demographic  variables. 


Table  2 

Cronbach's  alphas  for  the  Intensity  and  Problem  Scales  of  the  SESBT-R 


Intensity  Scale 

Problem  Scale 

Total 

98 

.  y\j 

ESE  Classes 

.97 

.95 

Regular  Classes 

.98 

.97 

Male 

.98 

.97 

Female 

.98 

.96 

African  American 

.97 

.96 

Caucasian 

.98 

.97 

Kindergarten 

.97 

.95 

First 

.99 

.95 

Second 

.98 

.97 

Third 

.98 

.98 

Fourth 

.97 

.97 

Fifth 

.98 

.97 

45 

Test-retest  Reliability 

SESBI 

The  test-retest  correlation  for  children  in  ESE  classrooms  was  r(27)  =  .92,  2 
<  .001,  for  the  Intensity  Scale  and  r(26)  =  .92,  p  <  .001,  for  the  Problem  Scale. 
The  paired-t  tests  indicated  that  Intensity  Scale  did  not  change  systematically  from 
Time  1  to  Time  2,  1(26)  =  1.87,  p  <  .08.  The  Problem  Scale,  however,  did  change 
systematically,  1(25)  =  -3.29,  p  <  .01.  This  represented  a  decrease  in  3  problems, 
on  average,  over  the  6  week  period. 

For  the  children  in  regular  classrooms,  the  test-retest  correlations  were  r(75)  = 
.86,  p  <  .001,  and  r(70)  =  .93,  p  <  .001  for  the  Intensity  and  Problem  Scales 
respectively.  The  paired-l  tests  indicated  that  neither  the  Intensity  Scale,  1(74)  =  .68, 
p  <  .50,  nor  the  Problem  Scale,  1(69)  =  -1.26,  p  <  .21  changed  systematically. 
SESBI-R 

The  test-retest  correlation  for  the  children  in  ESE  classrooms  was  r(21)  =  .64, 
p  <  .01,  for  the  Intensity  Scale  and  r(24)  =  .94,  p  <  .01,  for  the  Problem  Scale. 
The  paired-t  tests  indicated  that  Intensity  Scale  did  not  change  systematically,  t(20)  = 
.28,  p  <  .78.  The  Problem  Scale  did  change  systematically,  i(23)  =  -2.91,  p  < 
.01.  This  represented  a  decrease  in  2  problems,  on  average,  over  the  6  week  period. 

For  the  children  in  regular  classrooms,  the  test-retest  correlation  was  r(75)  = 
.87,  p  <  .001,  for  the  Intensity  Scale  and  r(72)  =  .93,  p  <  .001,  for  the  Problem 
Scale.  The  paired-t  tests  indicated  that  neither  the  Intensity  Scale,  1(74)  =  .44,  p  < 
.66,  nor  the  Problem  Scale,  1(71)  =  .08,  p  <  .94  changed  systematically. 


46 

To  explore  the  lower  test-retest  correlation  for  the  SESBI-R  Intensity  Scale 
with  children  in  ESE  classrooms  the  test-retest  correlations  for  the  two  factors  of  the 
SESBI-R  were  examined.  The  test-retest  correlation  for  Factor  1  of  the  Intensity 
Scale  in  ESE  classrooms  was  r(27)  =  .92,  p  <  .001.  The  paired-t  tests  indicated  that 
Intensity  Scale  did  change  systematically,  t(26)  =  2.14,  p  <  .05.  This  represented  a 
decrease  of  7  in  the  Intensity  Scale  score  over  the  6  week  period.  The  test-retest 
correlation  for  Factor  2  of  the  Intensity  Scale  in  ESE  classrooms  was  r(27)  =  .32,  p 
<  .11.  The  paired-l  tests  indicated  that  Intensity  Scale  did  not  change  systematically, 
1(26)  =  .99,  p  <  33. 

For  the  children  in  regular  classrooms,  the  test-retest  correlation  of  Factor  1  of 
the  Intensity  Scale  was  r(75)  =  .85,  p  <  .001.  For  Factor  2  of  the  Intensity  Scale 
with  children  in  regular  classrooms,  the  test-retest  correlations  was  r(75)  =  .80,  p  < 
.001.  The  paired-l  tests  indicated  that  neither  Factor  1,  i(74)  =  .06,  p  <  .95,  nor 
the  Factor  2,  1(74)  =  .24,  p  <  .82  changed  systematically. 

REdSOCS 

Kappa 

Kappa  was  used  to  calculate  the  inter-rater  reliability  for  the  REdSOCS.  The 
mean  Kappa  score  for  the  appropriate  versus  inappropriate  category  was  .65  ranging 
from  .02  to  .90.  For  the  comply  versus  noncomply  category  kappa  was  .72,  ranging 
from  .06  to  1.00.  Finally,  the  mean  kappa  score  for  the  on-task  versus  off-task 
category  was  .68,  ranging  from  .29  to  .95.  The  mean  kappa  levels  all  fall  into  the 
category  of  "good"  (.60  to  .75)  as  defined  by  Fleiss  (1981). 


47 

Demographic  Effects 

For  the  Offtask  category,  analysis  of  variance  (see  Table  4)  revealed  a  Type  of 
Classroom  x  Child  Race  interaction  with  Caucasian  children  being  offtask  a  greater 
percentage  of  time  in  ESE  classes  and  African  American  children  being  offtask  a 
greater  percentage  of  time  in  regular  classes  (see  Table  3).  There  was  also  a  Grade  x 
Child  Gender  x  Child  Race  interaction,  however  the  pattern  of  this  interaction  was 
uninterpretable,  possibly  due  to  the  small  number  of  children  in  each  cell  (2-3).  The 
estimated  standard  deviation  for  the  model  was  12.39.  Individual  teacher  differences 
did  not  effect  the  observations  of  children's  inappropriate  behavior  Z  =  .78,  p  =  n.s. 


Table  3 

Means  and  Standard  Errors  of  Offtask  by  Type  of  Classroom  and  Race 


African  American 

Caucasian 

Type  of  Classroom 

M 

SE 

M 

SE 

ESE 

21.2 

6.5 

31.0 

7.8 

Non-ESE 

29.9 

3.7 

21.6 

3.3 

For  the  Inappropriate  category,  the  analysis  of  variance  (see  Table  5)  yeilded  a 
Type  of  Classroom  x  Child  Race  interaction  with  Caucasian  children  displaying  more 
inappropriate  behavior  in  ESE  classrooms  and  African  American  children  displaying 
more  inappropriate  behavior  in  regular  classes.  Means  and  standard  errors  for  the 
percentage  of  time  children  engaged  in  inappropriate  behaviors  are  presented  in  Table 
6.  The  standard  deviation  for  the  model  was  6.55. 


Table  4 


Analysis  of  Variance  for  the  Offtask  Category 


Sourrp 

NDF 

DDF 

Tvnp  TIT  F 
1  ypc  111  r 

A  In  no 

/\ipna 

J 

Z.  Ij 

.U4 

1 
1 

o 
o 

.UO 

.oi 

1 
1 

•Ui 

.00 

v^llilU  vJCiiiiCl 

1 
1 

L3 

1  Al 

1.4  J 

.z4 

vJIdUC  A.  l^IlllU  KaCe 

Id 

.73 

.60 

Grade  X  Child  Gender 

5 

23 

3.12 

.03 

ESE  X  Child  Race 

1 

23 

7.61 

01 

ESE  X  Child  Gender 

1 

23 

40 

Child  Race  X  Child  Gender 

1 

23 

.04 

.84 

Grade  X  C.  Race  X  C.  Gender 

5 

23 

10 

ESE  X  C.  Race  X  C.  Gender 

1 

23 

3  n 

no 

Table  5 

Analysis  of  Variance  for  the  Inannrnnriflfp,  Catponrv 

Source 

NDF 

DDF 

lype  111  r 

Alpha 

ESE 

13 

1  17 
1.1/ 

Child  Race 

38 

.29 

.59 

Child  Gender 

38 

.93 

.34 

ESE  X  Child  Race 

38 

5.24 

.03 

ESE  X  Child  Gender 

38 

.20 

.65 

Child  Race  X  Child  Gender 

38 

.69 

.41 

ESE  X  C.  Race  X  C.  Gender 

38 

1.47 

.23 

49 

Table  6 

Means  and  Standard  Errors  of  Inappropriate  Category  by  Type  of  Classroom  and  Race 

African  American  Caucasian 
Type  of  Classroom  M  SE  M  SE 

ESE  4.66  2.67  10.66  2.67 
Non-ESE  7^2  L39  3.52  1.31 

For  the  Noncomply  category,  the  analysis  of  variance  (see  Table  7)  yielded  a 
main  effect  for  Grade  with  second  grade  students  being  less  compliant  than  students  in 
other  grades  and  third  grade  students  being  more  compliant  than  students  in  most  other 
grades.  The  means  and  standard  errors  for  the  percentage  of  time  that  the  children 
were  noncompliant  are  shown  in  Table  8.  The  standard  deviation  estimate  for  the 
model  was  20.85. 


Table  7 


Analysis  of  Variance  for  the  Noncomply  Category 


Source 

NDF 

DDF 

Type  III  F 

Alpha 

Grade 

5 

38 

2.70 

.04 

ESE 

8 

.67 

.44 

Child  Race 

38 

.02 

.88 

Child  Gender 

38 

.58 

.45 

ESE  X  Child  Race 

38 

.10 

.75 

ESE  X  Child  Gender 

38 

.35 

.55 

Child  Race  X  Child  Gender 

38 

1.81 

.18 

ESE  X  C.  Race  X  C.  Gender 

38 

2.62 

.11 

50 

Table  8 

Means  and  Standard  Errors  for  Noncomply  by  Grade 


Grade  Mean  Standard  Error 


Kindergarten 

9.26 

8.36 

First 

8.22 

7.43 

Second 

33.87 

7.37 

Third 

0.0 

5.57 

Fourth 

8.99 

8.42 

Fifth 

12.63 

7.37 

Correlations  between  Behavioral  Observations  and  SESBI  Scores 
In  ESE  classrooms,  noncomply  was  significantly  correlated  with  SESBI 
Intensity  Scores.  Off-task  and  inappropriate  behavior  categories  were  not  related  to 
SESBI  scores  in  these  classrooms.  In  regular  classrooms  the  noncomply  category  was 
not  related  to  SESBI  scale  scores.  Both  off-task  and  inappropriate  categories  were 
significantly  correlated  to  both  SESBI  scales  (see  Table  9). 

Correlations  between  Behavioral  Observations  and  SESBI-R  Scores 
In  ESE  classrooms  noncomply  was  significantly  correlated  with  higher  SESBI 
Intensity  Scores,  but  not  to  off-task  or  inappropriate.  In  the  regular  classrooms, 
noncomply  to  teacher  request  was  not  related  to  SESBI  Intensity  or  Problem  Scale 
Scores.  Both  off-task  and  inappropriate  were  significantly  related  to  the  SESBI  scales 
(see  Table  10).  The  magnitude  of  the  correlations  between  the  SESBI-R  and  off-task 
and  inappropriate  categories  for  the  ESE  and  Regular  classrooms  were  similar.  It  is 
likely  that  the  correlations  between  the  SESBI-R  and  off-task  and  inappropriate 


51 

categories  in  the  ESE  classrooms  were  not  significant  due  to  the  small  sample  size 
resulting  in  a  lack  of  power. 

Table  9 

Correlations  between  Behavioral  Observations  and  the  SESBI 


Category 

Type  of  Classroom 

Inappropriate 

Noncomply 

Off-task 

Regular* 

Intensity 

.34* 

-.13 

.48** 

Problem 

.52** 

-.14 

.62** 

ESE" 

Intensity 

.28 

.68* 

.43 

Problem 

.40 

.60 

.53 

Total  Sample' 

Intensity 

.35** 

-.07 

49** 

Problem 

.50** 

-.08 

.62** 

«n  =  48.  "n  =  12.  ^n  =  60. 


*p  <.05.  **p  <.01. 

CBCL-TRF 

Externalizing  Score 

In  the  first  ANOVA  model,  there  was  a  significant  Teacher  effect,  Z  =  3.12, 
p  <  .01,  which  accounted  for  34  percent  of  the  variance  in  the  model.  Results 
revealed  a  main  effect  for  Type  of  Classroom,  F(l,  42)  =  6.68,  p  <  .01,  with 


52 

Table  10 

Correlations  between  Behavioral  Observations  and  the  SESBT-R 


Type  of  Classroom 

Cfltpoorv 

Inannronriate 

Nonmmn  1  v 

Off-tact 

Regular^ 

Intensity 

.33* 

-.13 

.49** 

Problem 

.56** 

-.13 

.64** 

ESE" 

Intensity 

.33 

.68* 

.47 

Problem 

.40 

.64* 

.54 

Total  Sample' 

Intensity 

.34** 

-.07 

47** 

Problem 

45** 

-.09 

.60** 

"n  =  12.  ^n  =  48.  ^n  =  60. 


*p  <.05.  **p  <.01. 

children  in  ESE  classes  obtaining  higher  scores  than  children  in  regular  classes.  There 
was  also  a  significant  Type  of  Classroom  x  Grade  x  Child  Race  interaction,  F(5,  316) 
=  3.09,  c  <  .01. 

The  significant  results  and  their  components  were  retained  for  the  second 
ANOVA  (see  Table  11).  Again  there  was  a  main  effect  for  Type  of  Classroom,  with 
children  in  ESE  classes  obtaining  higher  scores  than  children  in  regular  classes.  There 
were  also  main  effects  for  Child  Race  with  African  American  children  obtaining 
higher  scores  than  Caucasian  children,  and  Child  Gender  with  boys  obtaining  higher 


53 


scores  than  girls.  There  was  a  significant  interaction  for  Type  of  Classroom  x  Grade 
X  Child  Race  (see  Figures  1-6).  Means  and  standard  errors  of  the  raw  scores  are  in 
Table  12;  the  estimated  standard  deviation  for  the  model  was  11.27. 


Table  11 


Analysis  of  Variance  for  the  Externalizing  Raw  Score  of  the  CBCL-TRF 


Source 


NDF      DDF      Type  III  F  Alpha 


Grade 


BSE 


Child  Race 

Child  Gender 

Teacher  Race 

ESE  X  Child  Race 

ESE  X  Grade 

Child  Race  X  Grade 

ESE  X  Teacher  Race 

ESE  X  C.  Race  X  C.  Gender 


5 
5 
1 

5 


342 
42 
342 
342 
42 
342 
342 
342 
42 
342 


1.23 
6.44 
5.56 
15.18 
1.86 
.87 
2.15 
2.17 
2.52 
3.22 


.29 
.02 
.02 
.00 
.18 
.35 
.06 
.06 
.12 
.00 


Kindergarten 


54 


8 

CO 

o> 
c 

N 
CD 

E 
•R 

UJ 


White 


Black 


^  ESE  Regular 
Fig.  1.  Type  of  Classoom  x  Race  x  Grade 


First  Grade 


8 


51 
48te 
45" 
42  ■ 
39 
36 
33 
30 
i'  27 
24  ■ 
21  - 
18 
15 
12 
9-' 
6'^ 
3 
0^ 


c 

£ 
m 


31 


White 


Black 


ESE  Regular 
Fig.  2.  Type  of  Classoom  x  Race  x  Grade 


Second  Grade 


51 

48 

45 

42 

39 
£  36 

33 
-  30 

I  24 

E  21^ 

^    Ife  17 


1! 
12 

1. 

3 
0^ 
White 


.1.1 


Black 


ESE  Regular 
Fig.  3.  Type  of  Classoom  x  Race  x  Grade 


Third  Grade 


51 

48 

45 

42 

39 
2  36 
8  33 

30 
•n  27 
i  24 

i    21  20 


12f 
9 
6 
3 

oJ 


White 


16 


Black 


•  ESE         •  Regular 
Fig.  3.  Type  of  Classoom  x  Race  x  Grade 


Fourth  Grade 


56 


51 
48 
45 
42 
39 
£  36 

r§  33 
^  30 
.E  27 


24 


0^ 
White 


Black 


^  ESE  Regular 
Fig.  5.  Type  of  Classoom  x  Race  x  Grade 


Fifth 


57 

Table  12 

Means  and  Standard  Errors  of  the  CBCL-TRF  Externalizing  Raw  Score  by  Grade. 
Race,  and  Type  of  Classroom 

Caucasian         Caucasian         African  African 
Grade  ESE  Regular  American  American 

ESE  Regular 


M 

SE 

M 

SE 

M 

SE 

M 

SE 

Kindergarten 

17.9 

10.7 

11.5 

3.3 

15.5 

8.6 

18.5 

3.4 

First 

45.9 

10.8 

7.2 

3.9 

31.2 

8.6 

9.2 

4.0 

Second 

14.7 

6.3 

5.3 

3.8 

16.6 

5.1 

10.6 

3.7 

Third 

12.3 

4.8 

7.1 

3.6 

15.5 

5.1 

17.9 

3.7 

Fourth 

13.9 

5.8 

10.8 

3.6 

19.9 

4.9 

13.8 

3.9 

Fifth 

13.3 

4.6 

9.7 

3.3 

32.4 

4.7 

12.0 

3.4 

For  the  Internalizing  Scale  of  the  CBCL-TRF,  the  ANOVA  model  yielded  a 
significant  teacher  effect,  Z  =  3.14,  p  <  .01,  which  accounted  for  33  percent  of  the 
variance  in  the  model.  There  was  also  a  significant  Type  of  Classroom  x  Grade  x 
Child  Race  interaction,  F(5,  316)  =  2.58,  p  <  .05. 

The  significant  results  and  their  components  were  retained  for  the  second 
ANOVA  (see  Table  13).  There  were  main  effects  for  Type  of  Classroom  with 
children  in  ESE  classrooms  obtaining  higher  scores  than  those  in  regular  classrooms. 
A  significant  interaction  was  found  for  Type  of  Classroom  x  Grade  x  Child  Race, 
again  there  was  no  interpretable  pattern.  Means  and  standard  errors  for  the 
Internalizing  raw  scores  are  presented  in  Table  14,  the  estimated  standard  deviation 
for  the  model  was  5.09. 


Table  13 


Analysis  of  Variance  for  the  Internalizing  Raw  Score  of  the  CBCL-TRF 


Source 

NDF 

DDF 

Type  III  F 

Alpha 

Grade 

5 

342 

2.20 

.05 

ESE 

1 

43 

11.85 

.00 

Child  Race 

1 

342 

.73 

.39 

Child  Gender 

1 

342 

.79 

.37 

Teacher  Race 

1 

43 

.03 

.87 

ESE  X  Grade 

5 

342 

4.03 

.00 

Child  Race  X  Grade 

5 

342 

1.70 

.13 

ESE  X  Child  Race 

1 

342 

0.09 

.76 

ESE  X  C.  Race  X  Grade 

5 

342 

2.30 

.04 

Table  14 

Means  and  Standard  Errors  of  the  CBCL-TRF  Internalizing  Raw  Score  bv  Grade 
Race,  and  Type  of  Classroom 


Caucasian       Caucasian  African  African 


Grade 

ESE 

Regular 

American 

American 

ESE 

Regular 

M 

SE 

M 

SE 

M 

SE 

M 

SE 

Kindergarten 

4.8 

4.7 

4.6 

1.5 

5.6 

3.7 

5.4 

1.5 

First 

18.3 

4.7 

2.4 

1.7 

22.3 

3.7 

2.5 

1.8 

Second 

4.2 

2.9 

4.0 

1.7 

8.9 

2.3 

4.9 

1.6 

Third 

9.3 

1.9 

6.0 

1.6 

2.9 

2.1 

7.3 

1.7 

Fourth 

7.9 

2.5 

5.5 

1.6 

6.2 

2.1 

6.7 

1.7 

Fifth 

10.9 

1.9 

5.9 

1.5 

12.0 

1.9 

6.4 

1.5 

Relationship  with  the  SESBI 

The  Intensity  and  Problem  Scales  both  correlated  highly  with  the  externalizing 
raw  score  of  the  CBCL-TRF  in  regular  and  ESE  classrooms  (see  Table  15).  The 
internalizing  raw  score  of  the  CBCL-TRF  was  not  related  to  the  SESBI  scales  in  the 
ESE  classrooms.  The  internalizing  raw  score  was  moderately  related  to  the  SESBI 
scales  in  regular  classrooms.  The  externalizing  raw  score  correlations,  however,  were 
signficantly  higher  for  both  the  Intensity  (z  =  11.24,  p  <  .0001)  and  Problem  (z  = 
10.56,  p  <  .0001)  scales. 
Relationship  with  the  SESBT-R 

The  pattern  of  results  was  the  same  for  the  SESBI-R  as  for  the  SESBI.  Both 
the  Intensity  and  Problem  Scales  correlated  highly  with  the  externalizing  raw  score  of 
the  CBCL-TRF  in  regular  and  ESE  classrooms.    The  internalizing  raw  score  of  the 
CBCL-TRF  was  not  related  to  the  SESBI-R  scales  in  the  ESE  classrooms,  but  was 
moderately  related  to  the  SESBI-R  scales  in  regular  classrooms  (see  Table  15).  Again 
the  externalizing  raw  score  correlations  were  signficantly  higher  for  both  the  Intensity 
(z  =  11.61,  p  <  .0001)  and  Problem  (z  =  9.48,  p  <  .0001)  Scales. 

The  same  pattern  was  found  when  examining  the  relationship  between 
internalizing  and  externalizing  scores  for  the  CBCL-TRF  as  was  found  with  the  SESBI 
scales  and  the  internalizing  and  externalizing  scores.  The  internalizing  raw  score  of 
the  CBCL-TRF  was  not  related  to  the  externalizing  raw  score  of  the  CBCL-TRF  in 
the  ESE  classrooms.  The  internalizing  raw  score  was  moderately  related  to  the 
externalizing  raw  score  in  regular  classrooms. 


Table  15 

Relationship  between  the  SESBI  and  SESBI-R  Scale  Scores  and  the  CBCL-TRF 


External 

Internal 

ESE" 

Regular'' 

TotaP 

ESE' 

Total' 

Intensity 

.87** 

.85** 

.11 

.48** 

.43** 

Problem 

.66** 

.81** 

.80** 

.10 

.36** 

SESBI 

Intensity 

.88** 

.85** 

.14 

49** 

.45** 

Problem 

.66** 

.82** 

.80** 

.12 

39** 

.36** 

CBCL-TRF 
External 

.19 

49** 

.41 

•n  =  82.  ^n  =  289.  'n  =  371.  

**e<.oi. 

Effects  of  Demographic  Variables  on  the  SESBI  Intensity  Scale 
Approximately  21  percent  of  the  variance  in  the  model  was  accounted  for  by 
random  Teacher  effects,  Z  =  2.77,  p  <  .01,  suggesting  that  about  a  fifth  of  the 
variability  was  due  to  individual  teacher  differences.  The  initial  ANOVA  yielded  a 
significant  main  effect  for  Type  of  Classroom  (regular  versus  ESE)  and  a  Type  of 
Classroom  X  Child  Gender  X  Child  Race  X  Teacher  Race  Interaction. 

In  the  second  ANOVA  model  (see  Table  16)  only  significant  effects  and  their 
component  parts  were  retained  for  analysis.  Again  there  was  a  significant  random 
Teacher  effect,  Z  =  3.13,  p  <  .01,  which  accounted  for  approximately  23  percent  of 


61 

Table  16 

Analysis  of  Variance  Model  with  the  SESBI  Intensity  Scale 


uur 

Tvr»A  IIT  F 

1  ype  111  r 

Alpha 

A  C 

j.UZ 

.05 

L^nuu  Kace  i 

j4/ 

.24 

.62 

v-iiiiu  vjcnucr  l 

•3 /IT 

fin 

A  O 

.4o 

1 cdcncr  ivdcc  1 

A  Q 

4o 

HA 

.43 

Hoc  A  L-niia  Kace  i 

i/n 
J4/ 

.93 

.34 

coc  A  L-niiQ  uenaer  i 

34/ 

1    1  o 

1.18 

O 

.28 

ivdcc  A  i^.  vjcnuer  i 

j4/ 

"^A 

.63 

v^.  IxaLC  A  1  .  IvdCc  1 

12.32 

.00 

tSE  X  T.  Race  1 

48 

.99 

.33 

C.  Gender  X  T.  Race  1 

347 

9.24 

.00 

C.  Race  X  C.  Gender  X  T.  Race  1 

347 

1.69 

.19 

ESE  X  C.  Race  X  C.  Gender  1 

347 

.01 

.94 

ESE  X  C.  Race  X  T.  Race  1 

347 

.18 

.67 

ESE  X  C.  Gender  X  T.  Race  1 

347 

2.97 

.09 

ESE  X  C.  Gender  X  C.  Race  X  T.  1 

347 

7.82 

.00 

Race 


the  variance  in  the  model.  Significant  fixed  effects  included  a  main  effect  for  Type  of 
Classroom  with  children  in  ESE  classes  receiving  higher  scores  than  children  in 
regular  classrooms  (see  Figure  7)  and  a  Type  of  Classroom  X  Child  Gender  X  Child 
Race  X  Teacher  Race  Interaction  (see  Figure  8).  Due  to  the  small  number  of  African 
American  teachers  in  ESE  classrooms  (2),  only  the  regular  classroom  data  was 
considered.  In  this  interaction,  teachers  rated  boys  of  their  own  race  as  having  fewer 
behavior  problems  than  boys  of  the  opposite  race.  Also,  African  American  teachers 


160 
150 
140 
130 


Regular  ESE 

Type  of  Classroom 

Fig  7.  Type  of  Classroom  Main  Effect 


Regular  Classrooms 

160t 


«  140 
OT  120 


White  Boys         White  Girls         Black  Boys         Black  Girls 

ii  White  Teachers    M  Black  Teachers 
Fig.  8.  Race  x  Gender  x  Teacher  Race  Interaction 


63 

rated  Caucasian  girls  as  displaying  more  behavior  problems  than  Caucasian  teachers 
rated  Caucasian  girls  as  displaying.  Finally,  African  American  and  Caucasian 
teachers  rated  African  American  girls  similarly. 

Effects  of  Demographic  Variables  on  the  SESBI  Problem  Scale 

In  the  intial  ANOVA  there  was  a  significant  main  effect  for  Type  of  Classroom 
and  a  significant  interaction  for  Type  of  Classroom  X  Child  Race  X  Child  Gender  X 
Teacher  race.  There  was  a  large  random  Teacher  effect,  Z  =  3.48,  p  <  .001,  which 
accounted  for  51  percent  of  the  variance  in  the  model. 

In  the  second  ANOVA  model  (see  Table  17),  there  was  a  significant  random 
Teacher  effect,  Z  =  3.94,  p  <  .0001,  accounting  for  approximately  52  percent  of  the 
variance  in  the  model.    There  was  a  significant  main  effect  for  Type  of  Classroom, 
with  children  in  ESE  classrooms  obtaining  higher  SESBI  Problem  scores  than  children 
in  regular  classrooms  (see  Figure  9).  There  was  also  significant  Type  of  Classroom  X 
Child  Race  X  Child  Gender  X  Teacher  race  interaction  (see  Figure  10).  Again  due  to 
the  small  number  of  African  American  ESE  teachers,  only  the  regular  classroom 
results  were  considered  meaningful.  The  interaction  indicated  that  teachers  rated 
children,  both  boys  and  girls,  of  their  own  race  as  having  fewer  behavior  problems. 
This  was  particularly  true  with  African  American  girls  and  African  American  teachers. 
Effects  of  Demographic  Variables  on  the  SESBI-R  Intensitv  Scale 

In  the  intial  ANOVA  model  containing  all  of  the  variables  and  their 
interactions,  there  was  a  significant  effect  for  teacher,  Z  =  2.94,  p  <  .01.  The 


Regular  Classrooms 

21t 


18 


H  White  Teachers    \M  Black  Teachers 
Fig.  10.  Race  x  Gender  x  Teacher  Race  Interaction 


65 

Teacher  variable  accounted  for  approximately  25  percent  of  the  variance  in  the  model. 
There  was  also  a  significant  effect  for  Type  of  Classroom  F(l,  42)  =  4.17,  p  <  .05. 
A  Classroom  X  Child  Gender  X  Child  Race  X  Teacher  Race  Interaction,  F(l,  316)  = 
3,70,  p  <  .06,  was  near  significance  and  was  therefore  retained  for  the  second 
model. 


Table  17 

Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI  Problem  Scale 


Source  NDF 

DDF 

Type  III  F 

Alpha 

ESE  1 

49 

4.99 

.03 

Child  Race  1 

346 

.85 

.36 

Child  Gender  1 

346 

.11 

.74 

Teacher  Race  1 

346 

.04 

.84 

ESE  X  Child  Race  1 

346 

.09 

.77 

ESE  X  Child  Gender  1 

346 

6.75 

.00 

C.  Race  X  C.  Gender  1 

346 

.01 

.94 

C.  Race  X  T.  Race  1 

346 

6.49 

.01 

ESE  X  T.  Race  1 

346 

.02 

.89 

C.  Gender  X  T.  Race  1 

346 

5.60 

.02 

C.  Race  X  C.  Gender  X  T.  Race  1 

346 

5.70 

.02 

ESE  X  C.  Race  X  C.  Gender  1 

346 

.48 

.49 

ESE  X  C.  Race  X  T.  Race  1 

346 

1.97 

.16 

ESE  X  C.  Gender  X  T.  Race  1 

346 

8.61 

.00 

ESE  X  C.  Gender  X  C.  Race  X  T.  1 

346 

4.55 

.03 

Race 


66 

In  the  second  ANOVA  model  only  significant  effects  and  their  component 
parts  were  retained  for  analysis  (see  Table  18).  Again  there  was  a  significant  Teacher 
effect,  Z  =  3.25,  p  <  .001,  which  accounted  for  approximately  25  percent  of  the 
variance  in  the  model.  Again  there  was  a  main  effect  for  Type  of  Classroom,  F(l, 
49)  =  4.61,  p  <  .05  (see  Figure  11)  with  children  in  ESE  classes  obtaining  higher 
SESBI-R  scores  than  children  in  regular  classrooms.  The  Type  of  Classroom  X  Child 
Gender  X  Child  Race  X  Teacher  Race  was  also  significant  F(l,  346)  =  4.21,  p  < 
.05  (see  Figure  12).  Due  to  the  small  number  of  African  American  teachers  in  ESE 
classrooms  (2),  only  the  regular  classroom  data  was  considered.  In  this  interaction, 
teachers  rated  boys  of  their  own  race  as  having  fewer  behavior  problems  than  boys  of 
the  opposite  race.  Also,  African  American  teachers  rated  Caucasian  girls  as 
displaying  more  behavior  problems  than  Caucasian  teachers  rated  Caucasian  girls  as 
displaying.  Finally,  African  American  and  Caucasian  teachers  rated  African  American 
girls  similarly. 

Effects  of  Demographic  Variables  on  the  SESBI-R  Problem  Scale 
In  the  intial  ANOVA  with  all  variables  and  their  interactions,  a  Gender  X 
Child  Race  interaction,  F(l,  316)  =  3.94,  p  <  .05,  and  a  Type  of  Classroom  X 
Child  Gender  X  Teacher  Race  interaction,  F(l,  316)  =  6.50,  p  <  .01,  were 
significant.  An  ESE  main  effect  approached  significance,  F(l,  42)  =  2.79,  p  <  .10, 


s> 
8 

CO 

•55 
c 

B 
c 

01 

m 

(0 
UJ 
(0 


160 
150 
140 
130 
120 
110 
100 
90 
80 
70 
60 
50 
40 
30 


67 


125 

Hp 

 96 

HIiliiililiiiliiiiifli 

Regular 

ESE 

Type  of  Classroom 
Fig.  1 1 .  Type  of  Classroom  Main  EfiFect 


Regular  Classrooms 


White  Boys 


White  Girls 


Black  Boys 


Black  Girls 


H  White  Teachers    {SB  Black  Teachers 
Fig.  12.  Race  x  Gender  x  Teacher  Race  Interaction 


68 

Table  18 

Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI-R  Intensity  Scale 


Source  NDF 

DDF 

Type  III  F 

Alpha 

ESE  1 

49 

4.61 

.42 

Child  Race  1 

346 

.12 

.73 

Child  Gender  1 

346 

.65 

.42 

Teacher  Race  1 

42 

1.10 

.29 

ESE  X  Child  Race  1 

346 

.5 

.48 

ESE  X  Child  Gender  1 

346 

1.44 

.23 

C.  Race  X  C.  Gender  1 

346 

.25 

.62 

C.  Race  X  T.  Race  1 

346 

9.27 

.00 

ESE  X  T.  Race  1 

346 

1.86 

.17 

C.  Gender  X  T.  Race  1 

346 

5.83 

.02 

C.  Race  X  C.  Gender  X  T.  Race  1 

346 

.51 

.47 

ESE  X  C.  Race  X  C.  Gender  1 

346 

.00 

.95 

ESE  X  C.  Race  X  T.  Race  1 

346 

.01 

.94 

ESE  X  C.  Gender  X  T.  Race  1 

346 

1.35 

.25 

ESE  X  C.  Gender  X  C.  Race  X  T.  1 
Race 

346 

4.21 

.05 

and  was  therefore  retained  in  the  second  model.  There  was  a  large  teacher  effect,  Z 
=  3.61,  p  <  .001,  which  accounted  for  as  much  as  58  percent  of  the  variance  in  the 
model. 

In  the  second  model  (see  Table  19),  there  was  a  significant  teacher  effect,  Z  = 
4.01,  p  <  .0001,  accounting  for  approximately  57  percent  of  the  variance  in  the 
model.  There  was  also  a  significant  Type  of  Classroom  X  Child  Gender  X  Teacher 
Race  Interaction,  F(l,  352)  =  5.30,  p  <  .05  (see  Figure  13). 


69 

Table  19 

Analysis  of  Variance  Model  of  Fixed  Effects  on  the  SESBI-R  Problem  Scale 


Source 

NDF  DDF 

Type  III  F 

Alpha 

ESE 

1  49 

3.06 

.09 

Child  Race 

1  352 

1.32 

.25 

Child  Gender 

1  352 

.80 

.37 

Teacher  Race 

1  49 

.30 

.59 

ESE  X  C.  Gender 

1  352 

3.76 

.05 

C.Race  X  T.  Race 

1  352 

3.25 

.07 

ESE  X  T.  Race 

1  49 

.25 

.62 

C.  Gender  X  T.  Race 

1  352 

2.72 

.10 

ESE  X  C.  Gender  X  T.  Race 

1  352 

5.30 

.02 

Cut  off  Scores 

SESBI 

The  cut  off  scores  were  obtained  using  the  distributions  for  the  two  types  of 
classrooms  (regular  and  ESE).  Because  the  variance  did  not  differ  significantly  for 
the  two  groups,  the  midpoints  between  the  two  means  was  considered  the  cut  point. 
The  cut  point  for  the  SESBI  Intensity  Scale  was  determined  to  be  96.  For  the  SESBI 
Problem  Scale  the  cut  point  was  9. 

Using  the  cut  of  score  derived  in  this  study,  the  SESBI  Intensity  Scale 
identified  73%  of  the  children  in  ESE  classes  as  having  significant  behavior  problems. 
The  SESBI  Problem  Scale  cut  off  score  identified  60%  of  children  in  ESE  classrooms 
as  having  behavior  problems.  The  SESBI  Intensity  Scale  identified  46%  of  children  ir 


70 

regular  classrooms  as  having  significant  behavior  problems  and  the  Problem  Scale 
identified  24%  of  the  children  as  having  significant  behavior  problems. 
SESBI-R 

The  cut  point  for  the  SESBI-R  Intensity  Scale  was  determined  to  be  1 10.  For 
the  SESBI-R  Problem  Scale  the  cut  point  was  10.  Using  the  cut  off  scores,  the 
SESBI-R  Intensity  Scale  identified  73%  of  the  children  in  ESE  classrooms  as  having 
significant  behavior  problems.  The  SESBI-R  Problem  Scales  63%  of  the  children  in 
ESE  classrooms  as  having  significant  behavior  problems.  The  SESBI-R  Intensity 
Scale  identified  47%  of  the  children  in  regular  classrooms  as  having  problems.  The 
SESBI-R  Problem  Scale  identified  27%  of  children  in  regular  classrooms  as  having 
significant  behavior  problems. 
CBCL-TRF 

Using  the  CBCL-TRF  scores  with  a  T  score  over  70,  which  is  defined  as 
clinically  significant,  33%  of  children  in  ESE  classes  were  identified  as  having 
behavior  problems.  Fifteen  percent  of  children  in  regular  classrooms  were  identified 
as  having  significant  behavior  problems  using  this  same  cut  off  score. 

Factor  Analyses  of  the  SESBI-R 

A  principal  components  analysis  was  conducted  followed  by  an  oblique 
(oblimin)  rotation.  Since  purpose  of  this  factor  analysis  was  to  replicate  results  found 
previously  by  Rayfield  and  Eyberg  (1993),  a  two  factor  solution  was  specified  in  the 
analysis.  Together,  the  two  factor  solution  accounted  for  67  percent  of  the  variance. 
The  first  unrotated  factor  accounted  for  61  %  of  the  variance  and  the  second  unrotated 


71 

factor  accounted  for  6%  of  the  variance.  Table  6  presents  the  results  from  the  two 
factor  solution.  Factor  one  contained  29  oppositional  items  (e.g.,  has  temper 
tantrums,  pouts,  acts  defiant).  The  second  factor  contains  9  items  that  relate  to 
attentional  difficulties  (e.g.,  is  easily  distracted,  fails  to  finish  tasks  or  projects,  and 
has  difficulty  staying  on  task).  The  correlation  between  the  two  factors  was  .65. 
Cronbach's  alpha  was  .98  for  the  ODD  Factor  and  .95  for  the  ADHD  Factor.  The 
mean  scores  and  standard  deviations  are  shown  in  Table  21. 

For  children  in  regular  classroooms,  the  Attention  Problems  narrow-band 
syndrome  scale  of  the  CBCL-TRF  correlated  significantly  with  the  ODD  Factor,  r(82) 
=  .70,  p  <  .001,  and  the  ADHD  Factor,  r(87)  =  .86,  p  <  .001.  The  correlation, 
however,  was  significantly  stronger  with  the  ADHD  Factor  than  the  ODD  Factor,  Z 
=  8.57,  p  <  .0001.  In  addition,  both  the  ADHD  and  ODD  Factors  correlated 
significantly  with  the  Off-task  category  of  the  REdSOCS,  r(48)  =  .51,  p  <  .001,  and 
r(48)  =  .45,  p  <  .001,  respectively.  These  correlations  were  not  statistically 
different,  Z  =  .93,  p  <  .17. 
Table  21 

Means  and  Standard  Deviations  for  Factors  1  and  2  of  the  SESRT-R 


ESE 

Regular 

N 

Mean 

Standard 
Deviation 

N 

Mean 

Standard 
Deviation 

Factor  1 

97 

105 

37.54 

306 

70 

40.08 

Factor  2 

100 

44 

15.84 

306 

32 

18.68 

72 

For  children  in  ESE  classroooms,  the  attention  problems  narrow  band 
syndrome  scale  of  the  CBCL-TRF  correlated  significantly  with  the  ODD  Factor,  r(82) 
=  .52,  j2  <  .001,  and  the  ADHD  Factor,  r(87)  =  .83,     <  .001.  The  correlation, 
however,  was  significantly  stronger  with  the  ADHD  Factor  than  the  ODD  Factor,  Z 
=  4.55,  p  <  .0001.  In  addition,  both  the  ADHD  and  ODD  Factors  correlated 
significantly  with  the  Off-task  category  of  the  REdSOCS,  r(48)  =  .57,  p  <  .001,  and 
r(48)  =  .39,  p  <  .001,  respectively.  These  correlations,  however,  were  not 
statistically  different,  Z  =  .73,  p  <  .23. 


■  Caucasian  HID  African  American 

Fig.  13.  Type  of  Classroom  x  Gender  x  Teacher  Race  Interaction 


73 

Table  20 

Factor  Loadings  on  the  Intensity  Scale  of  the  SESBI-R 


Factor 


1  2 


^  .23 

M  .37 

M  -.12 

J5  .26 

.14  M 

M  .26 

^  .15 

.42  ^ 

^  -.27 

J!4  .09 

JI  .08 

.17 

M  .17 

.12  ^ 


1 .  Has  temper  tantrums 

2.  Pouts 

3.  Teases  or  provokes  other  students 

4.  Lies 

5.  Acts  frustrated  with  difficult  tasks 

6.  Does  not  obey  school  rules  on  his/her  own 

7.  Demands  teacher  attention 

8.  Dawdles  in  obeying  rules  or  instructions 

9.  Acts  bossy  with  other  students 

10.  Gets  angry  when  doesn't  get  his/her  own  way 

11.  Interrupts  teacher 

12.  Impulsive,  acts  before  thinking 

13.  Refuses  to  obey  until  threatened 
with  punishment 

14.  Has  difficulty  staying  on  task 


74 

Table  20~continued 


Factor 


15.  Blames  others  for  problem  behaviors  ^  .29 

16.  Has  difficulty  entering  groups  .29  .54 

17.  Is  easily  distracted  .16  J& 

18.  Has  difficulty  accepting  criticism 

or  correction  ^  .35 

19.  Fails  to  finish  tasks  or  projects  -.06  ^ 

20.  Sasses  teacher  J2  -.22 

21.  Verbally  fights  with  other  students  ^  -.16 

22.  Whines  j|2  .33 

23.  Is  overactive  or  restless  ^  .16 

24.  Physically  fights  with  other  students  ^  -.19 

25.  Makes  noises  in  class  .09 

26.  Acts  defiant  when  told  to  do  something  ^  .23 

27.  Argues  with  teachers  about  rules 

or  instructions  j^.  10 

28.  Interrupts  other  students  ^  .03 


75 

Table  20-continued 


Factor 


1  2 


29.  Is  noisy 

M 

-.02 

30.  Has  trouble  awaiting  turn 

M 

.07 

31.  Talks  excessively 

Jl 

.05 

32.  Loses  things  needed  for  school  activities 

-.09 

M 

33.  Fidgets  or  squirms  in  seat 

.35 

34.  Fails  to  listen  to  instructions 

.22 

JA 

35.  Is  touchy  or  easily  annoyed 

.22 

36.  Bothers  others  on  purpose 

M 

-.09 

37.  Has  trouble  paying  attention 

.17 

Jl 

38.  Has  difficulty  staying  seated 

.33 

DISCUSSION 

Results  of  this  study  support  the  psychometric  strength  of  both  the  SESBI  and 
SESBI-R  for  assessing  teachers'  perceptions  of  behavior  problems  in  grade  school 
children.  Although  the  results  were  similar  in  many  respects,  there  are  some 
differences  in  the  psychometric  properties  of  the  two  measures.  The  differences  and 
their  implications  will  be  reviewed. 

Item  analyses  of  the  SESBI  and  SESBI-R  suggested  that  most  items  on  the 
measures  occur  "seldom  to  sometimes."  Further  examination  revealed  tiiat  the  items 
on  the  SESBI-R  are  less  positively  skewed  than  the  items  on  the  SESBI,  providing 
greater  variability  in  teacher  response  than  items  on  the  SESBI.  The  superior  item 
strength  of  the  SESBI-R  items  is  not  surprising  given  that  a  large  part  of  the 
development  of  the  SESBI-R  was  based  on  previous  item  analyses. 

The  internal  consistency  of  both  the  SESBI  and  SESBI-R  is  exceptionally  high 
and  does  not  differ  based  on  demographic  characteristics  such  as  gender,  grade,  or 
race.  This  high  internal  consistency  across  demographic  subgroups  indicates  that  the 
SESBI  and  SESBI-R  are  applicable  to  diverse  groups  of  grade  school  children.  The 
results  suggest  that  the  SESBI  and  SESBI-R  are  homogeneous  measures  of  disruptive 
behaviors  in  the  classroom. 

Teachers  in  this  sample  report  more  behavior  problems  on  both  the  SESBI  and 
SESBI-R  than  found  in  previous  studies  (Burns  &  Owen,  1990;  Rayfield  &  Eyberg, 

76 


77 

1996)  with  teachers  in  rural  areas  of  the  United  States.  However,  the  scores  were 
comparable  to  scores  obtained  by  Funderburk  and  Eyberg  (1989)  with  slightly  younger 
children  in  Gainesville,  a  more  urban  area  than  the  other  studies.  It  is  possible  that 
urban  school  districts  or  parts  of  the  country  produce  higher  mean  scores  on  the 
SESBI.  This  is  consistent  with  earlier  studies  documenting  greater  prevalence  of 
behavioral  problems  in  urban  children  (Rutter,  Cox,  Tupling,  Berger,  &  Yule,  1975; 
Connell,  Irvine,  &  Rodney,  1982).  As  noted  by  Eyberg  (1992)  it  appears  that  it  may 
be  important  to  compare  SESBI  and  SESBI-R  scores  for  students  to  their  peers  in  a 
similar  geogr^hic  area. 

The  high  test-retest  reliability  coefficients  with  children  in  ESE  and  regular 
classrooms  indicate  that  the  SESBI  is  stable  over  time.  Consistent  with  other  rating 
scales  (Barkley,  1990),  it  appears  there  is  a  small  decrease  in  number  of  problems 
endorsed  by  teachers  on  second  administration  of  the  SESBI  with  children  in  ESE 
classes.  Whether  this  finding  is  due  to  die  SESBI  or  to  an  actual  improvement  in 
behavior  of  the  children  is  still  unclear.  The  timing  of  the  retest  was  at  the  end  of  the 
school  year.  It  is  possible  that  teachers  are  more  tolerant  at  this  time  of  year.  In 
addition,  children  in  ESE  classes  receive  more  individual  attention  and  problem 
behaviors  tend  to  be  focused  on  in  their  individual  educational  programs.  Therefore, 
classroom  interventions  could  also  have  resulted  in  the  decrease  in  problems. 
Nevertheless,  this  decrease  suggests  that  a  second  administration  of  the  SESBI  could 
be  important  for  clinical  use.  There  was  no  systematic  change  in  SESBI  scores  for  the 
Intensity  Scale  in  ESE  classes  nor  the  Intensity  or  Problem  Scales  in  regular  classrooms. 


78 

As  with  the  SESBI,  the  high  test-retest  reliability  coefficients  with  children  in 
ESE  and  regular  classrooms  indicate  that  the  SESBI-R  is  also  stable  over  time.  The 
Intensity  Scale  of  the  SESBI-R  with  children  in  ESE  classrooms,  however,  was 
somewhat  less  stable  over  time  than  the  Intensity  Scale  of  the  SESBI.  This  decreased 
stability  was  due  to  the  increase  in  items  focusing  on  ADHD  symptoms  on  the  SESBI- 
R  than  on  the  SESBI.  Factor  2,  the  ADHD  items,  were  found  to  be  far  less  reliable 
over  time  than  Factor  1,  the  ODD  factor.  These  ADHD  type  symptoms  may  be  more 
variable  in  their  presentation  or  less  behaviorally  specific  than  items  dealing  with 
oppositional  behavior,  thus  leading  to  less  stability.  Also,  the  ADHD  items  may  be 
less  salient  in  the  classroom  than  the  ODD  items.  Similar  to  the  SESBI,  there  is  a 
systematic  decrease  in  problems  in  ESE  children  at  second  adminstration  on  the 
SESBI-R.  The  systematic  decrease  in  Problem  Scale  scores  in  ESE  classes  is  smaller 
with  the  SESBI-R  than  with  the  SESBI.  There  is  no  systematic  change  in  SESBI-R 
scores  for  the  Intensity  Scale  in  ESE  classes  nor  the  Intensity  or  Problem  Scales  in 
regular  classrooms.  Together,  these  results  suggest  that  the  SESBI  and  SESBI-R  both 
have  satisfactory  stability  over  time. 

Behavioral  observations  with  the  REdSOCS  indicated  that  for  the  Offtask  and 
Inappropriate  categories,  African  American  children  were  generally  more  offtask  and 
showed  more  inappropriate  behavior  in  regular  classrooms,  while  Caucasian  children 
were  more  offtask  and  displayed  more  inappropriate  behavior  in  ESE  classrooms. 
For  the  compliance  category,  there  were  large  differences  in  compliance  rates  by 


79 

grade  although  the  pattern  was  not  readily  interpretable.  More  research  is  needed  to 
establish  normative  compliance  levels  at  each  grade. 

The  patterns  of  correlations  between  the  SESBI  and  the  REdSOCS  and  the 
SESBI-R  and  the  REdSOCS  are  similar.  In  regular  classrooms,  the  SESBI  Intensity 
and  Problem  Scales  are  both  related  to  observed  inappropriate  behavior  and  off-task 
behavior.  There  was  no  relationship  between  noncompliance  to  teacher  commands 
and  the  SESBI  scales  in  regular  classrooms. 

The  significant  relationship  between  the  Intensity  Scale  of  the  SESBI  and 
noncompliance  in  ESE  classrooms  indicates  that  children  with  higher  SESBI  scores  are 
less  compliant  with  teacher  commands.  The  lack  of  a  significant  correlation  between 
the  SESBI  Problem  Scale  and  noncompliance  in  ESE  classrooms  was  likely  due  to 
insufficient  power  because  of  the  small  sample  of  children  in  ESE  classrooms 
observed.  Despite  the  similar  strength  of  the  correlations  between  the  SESBI  scales 
and  inappropriate  behavior  and  off-task  behavior  in  the  two  types  of  classrooms 
(regular  and  ESE),  the  relationships  between  inappropriate  behavior  and  off-task 
behavior  to  the  SESBI  scales  are  not  significant  in  ESE  classrooms.  Again  this  is 
likely  due  to  a  lack  of  statistical  power  in  the  correlational  analyses  with  the  ESE 
classrooms. 

Overall,  the  correlations  between  the  REdSOCS  categories  and  the  SESBI-R 
were  slightly  higher.  Again,  in  regular  classrooms,  the  SESBI-R  Intensity  and 
Problem  Scales  are  both  related  to  observed  inappropriate  behavior  and  off-task 


80 

behavior.  There  was  no  relationship  between  noncompliance  to  teacher  commands 
and  the  SESBI-R  scales  in  regular  classrooms. 

For  the  SESBI-R,  both  the  Intensity  and  Problem  Scales  correlate  significantly 
with  noncompliance  in  ESE  classrooms.  Thus  indicating  that  ESE  teachers  rate 
children  who  are  less  compliant  as  displaying  more  frequent  behavior  problems  and 
being  more  of  a  problem  for  the  teacher  to  handle.  Again,  despite  the  similar  strength 
of  the  correlations  between  the  SESBI-R  scales  and  inappropriate  behavior  and  off-task 
behavior  in  the  two  types  of  classrooms  (regular  and  ESE),  the  relationships  between 
inappropriate  behavior  and  off-task  behavior  to  the  SESBI-R  scales  are  not  significant 
in  ESE  classrooms.  Again,  this  is  likely  to  be  due  to  a  lack  of  statistical  power  in  the 
correlational  analyses  with  the  ESE  classrooms. 

The  strong  correlations  between  the  REdSOCS  and  the  SESBI  and  SESBI-R  in 
this  sample  were  higher  (Achenbach,  McConaughy,  &  Howell,  1987)  than  is  typically 
found  when  comparing  data  obtained  by  multiple  methods  (observation,  rating  scale) 
and  multiple  informants  (teachers,  observers).  The  pattern  of  correlations  between  the 
SESBI,  SESBI-R,  and  the  REdSOCS,  provide  clear  multi-method  concurrent  validity 
for  the  SESBI. 

Further  concurrent  and  discriminant  validity  is  provided  by  the  correlations  of 
the  SESBI  and  SESBI-R  with  the  CBCL-TRF  externalizing  and  internalizing  scales. 
Since  the  resulting  correlations  for  both  forms  of  the  SESBI  are  the  same,  the  results 
will  be  discussed  together.  Both  forms  of  the  SESBI  are  highly  correlated  the  CBCL- 
TRF  externalizing  scores  in  regular  and  ESE  classrooms.  In  fact,  in  regular 


81 

classrooms,  the  correlations  between  the  SESBI  Intensity  Scales  and  the  CBCL-TRF 
externalizing  score  are  strong  enough  to  suggest  that  the  scales  are  redundant  (Kazdin, 
Mazurick,  &  Bass,  1993).  The  SESBI  scales  do  not  correlate  with  the  CBCL-TRF 
internalizing  score  in  ESE  classrooms,  demonstrating  discriminant  validity  with  these 
children.  There  are  moderate  correlations  between  the  CBCL-TRF  internalizing  score 
and  the  SESBI  scales  in  regular  classrooms.  These  moderate  correlations  of  the 
SESBI  scores  to  the  Internalizing  broad  band  may  be  due,  in  part,  to  children  in 
regular  classrooms  having  either  very  few  or  a  significant  amount  of  both  types  of 
symptoms,  thus  resulting  in  a  higher  correlation.  In  both  types  of  classrooms,  the 
externalizing  score  correlated  significantly  higher  with  the  SESBI  scales  than  the 
internalizing  score,  thus  providing  discriminant  validity.  In  addition,  the  same  pattern 
of  scores  was  found  with  the  CBCL-TRF  internalizing  and  externalizing  scales.  In 
ESE  classrooms,  the  Internalizing  and  Externalizing  were  not  related  while  in  the 
regular  classrooms  the  two  broad  bands  were  moderately  correlated. 

The  main  effects  for  classroom  type  (ESE  or  Regular)  for  both  the  Intensity 
and  Problem  Scales  of  the  SESBI  confirm  the  hypothesis  that  children  in  ESE  classes 
obtain  higher  SESBI  scores.  This  result  is  not  surprising  given  that  children  are 
usually  referred  to  ESE  classes  due  to  behavior  problems  that  can  not  be  controlled  in 
the  regular  classroom  environment. 

There  were  numerous  effects  of  demographic  variables  such  as  child  race,  child 
gender,  and  contrary  to  previous  findings,  teacher  race  on  the  SESBI  Intensity  Scale. 
There  were  no  grade  effects  on  any  of  the  SESBI  scales.  The  4-way  interaction  for 


82 

type  of  classroom,  child  gender,  child  race,  and  teacher  race,  was  statistically 
significant  for  the  SESBI  Intensity  Scale;  however,  the  number  of  children  in  the  ESE 
classrooms  was  very  small  (as  low  as  4),  thus  making  the  replication  of  such  an 
interaction  unlikely.  Therefore,  only  the  regular  classrooms  were  considered.  In  this 
interaction,  teachers  rated  boys  of  their  own  race  as  having  fewer  behavior  problems. 
Caucasian  girls  were  rated  as  having  fewer  behavior  problems  by  Caucasian  teachers 
and  African  American  girls  were  rated  similarly  by  Caucasian  and  African  American 
teachers.  The  differential  SESBI  scores  by  race  are  somewhat  supported  by  behavioral 
observation  data  which  revealed  more  behavior  problems  in  African  American  children 
than  in  Caucasian  children  in  regular  classrooms. 

On  the  Problem  Scale  of  the  SESBI,  there  was  a  significant  effect  for  type  of 
classroom,  with  ESE  children  obtaining  higher  SESBI  scores  than  children  in  regular 
classrooms.  There  was  also  a  4  way  interaction  with  type  of  classroom,  child  gender, 
child  race,  and  teacher  race  for  the  Problem  Scale  of  the  original  SESBI.  As  noted 
above  the  number  of  subjects  in  each  cell  in  the  ESE  classes  was  relatively  small,  thus 
making  the  results  of  the  4  way  interaction  questionable  and  unlikely  to  be  replicated. 
In  regular  classes,  teachers  rated  children  of  their  own  race  as  having  fewer  behavior 
problems.  This  was  most  pronounced  with  African  American  girls.  It  should  be 
noted,  however,  that  the  absolute  difference  between  the  groups  was  very  small  (1 
problem).  Thus,  although  these  differences  are  statisically  significant,  it  may  be  that 
they  are  not  clinically  different. 


83 

For  the  SESBI-R,  the  main  effect  for  classroom  type  (ESE  or  Regular)  for  the 
Intensity  Scale  confirm  the  hypothesis  that  children  in  ESE  classes  obtain  higher 
SESBI  scores.  The  4-way  interaction  for  type  of  classroom,  child  gender,  child  race, 
and  teacher  race,  was  statistically  significant  for  the  SESBI-R  Intensity  Scale.  The 
pattern  of  results  was  the  same  as  that  of  the  SESBI  Intensity  Scale. 

For  the  Problem  Scale  of  the  SESBI-R,  the  main  effect  of  classroom  type  (ESE 
and  regular)  did  not  reach  statistical  significance,  although  there  was  a  trend  for 
children  in  ESE  classes  to  have  higher  SESBI-R  Problems  scores  than  children  in 
regular  classrooms. 

In  all  the  ANOVA  models,  individual  teacher  differences  accounted  for  a 
significant  amount  of  the  variance,  suggesting  that  rating  scale  scores  are  dependent, 
in  part,  on  the  individual  teacher  completing  them.  Individual  teacher  differences 
accounted  for  approximately  a  quarter  of  the  variance  for  the  Intensity  Scales  of  the 
SESBI  and  SESBI-R,  about  one  third  of  the  variance  for  the  CBCL-TRF  externalizing 
and  internalizing  scales,  and  about  half  for  the  SESBI  and  SESBI-R  Problem  Scales. 
Thus,  the  SESBI  and  SESBI-R  Intensity  Scale  scores  were  the  least  affected  by 
individual  teacher  differences.    The  pattern  of  results  indicate  that  the  Problem  Scales 
are  more  sensitive  to  individual  teacher  differences  than  the  Intensity  Scales,  which 
suggest  that  the  Problem  Scale  may  tap  into  individual  teacher  stress  levels  and  teacher 
tolerance  levels,  as  suggested  by  Eyberg. 

The  clinical  cut  off  poims  for  the  SESBI  and  SESBI-R  were  determined  using 
the  distributions  of  the  ESE  and  regular  classrooms.  Interestingly,  cut  off  scores 


84 

obtained  are  similar  to  those  found  for  the  Eyberg  Child  Behavior  Inventory,  a  similar 
measure  used  for  parent  ratings  of  child  problems.  The  cut  off  scores  of  the  SESBI 
were  able  to  correctly  identify  a  large  percentage  of  the  children  in  ESE  classes. 
However,  the  cut  off  scores  also  indicated  that  a  large  number  of  children  in  regular 
classrooms  had  significant  behavior  problems.  In  contrast,  the  CBCL-TRF  identified 
many  fewer  of  the  ESE  children  as  having  behavioral  problems,  but  the  CBCL-TRF, 
also  did  not  identify  a  large  percentage  of  children  in  regular  classrooms  as  having 
significant  behavior  problems.  Thus,  the  measure  used  for  screening  would  depend  on 
the  type  of  error  rate  one  most  wanted  to  avoid.  The  SESBI  scales  are  more  likely  to 
have  false  positives,  while  the  CBCL-TRF  is  likely  to  have  a  higher  miss  rate.  In 
multiple  gating  type  assessment,  however,  it  is  important  to  identify  children  with 
potentially  clinically  significant  behavior  problems  for  further  evaluation  rather  than 
miss  children  in  need  of  psychological  services. 

As  with  the  SESBI  and  SESBI-R,  children  in  ESE  classes  recieved  higher 
scores  on  both  the  internalizing  and  externalizing  broad  band  scales  of  the  CBCL- 
TRF.  On  the  externalizing  scale,  African  American  children  obtained  higher  scores 
than  Caucasian  children  and  boys  received  higher  ratings  than  girls.  There  was  also 
an  interaction  of  grade,  type  of  classroom,  and  child  race,  however,  the  pattern  was 
not  readily  intepretable.  The  differences  in  race  have  been  largely  neglected  in  the 
research  with  the  CBCL-TRF,  as  with  most  other  measures  of  behavior.  The 
normative  group  for  the  CBCL-TRF  is  generally  representative  of  the  demographics  of 


85 

the  country,  however,  the  United  States  is  predominantly  Caucasian.  Therefore,  race 
differences  may  be  washed  out. 

For  the  internalizing  scale,  problems  tended  to  increase  slightly  with  age  with 
the  exception  of  a  large  jump  at  first  grade.  This  could  possibly  be  due  to  increase 
demands  placed  on  children  when  starting  more  formal  schooling  which  then  declines 
in  the  second  grade  as  the  child  adjusts  to  the  school  setting.  Results  showed  that  this 
increase  in  internalizing  problems  is  larger  in  ESE  classes  than  regular  classes. 
Internalizing  problems  varied  by  grade  and  child  race  as  well,  although  there  did  not 
appear  to  be  a  particular  pattern  to  the  results. 

Factor  analysis  of  the  Intensity  Scale  of  the  SESBI-R  suggested  that,  although 
the  SESBI-R  is  a  general  measure  of  disruptive  behavior  problems,  it  also  has  two 
distinct  factors  that  can  be  measured.  The  oblique  rotation  was  chosen  because  it 
allows  factors  to  be  correlated,  thus  having  more  theoretical  value.  Clinically, 
Oppositional  Defiant  Disorder  and  Attention  Deficit  Hyperactivity  Disorder  are 
frequently  co-morbid  in  referred  children  (Abikoff,  &  Klein,  1992;  McMahon,  1994; 
Schuhmann,  Eyberg,  Boggs,  &  Rayfield,  1996).  The  two  factors  accounted  for  a 
large  amount  of  the  variance  and  most  items  loaded  well  onto  only  one  factor.  The 
factor  structure  obtained  was  similar  to  that  found  by  Rayfield  and  Eyberg  (1993), 
suggesting  that  the  factor  structure  is  stable  and  replicable.  Concurrent  validity  for  the 
Attention  Factor  was  provided  by  higher  correlations  between  Attention  Factor  and  the 
Attention  Problems  narrow-band  syndrome  scale  than  the  Oppositional  Factor  and  the 
Attention  Problems  narrow-band  syndrome  scale.  Internal  consistency  for  the  two 


86 

factors  was  also  high.  Overall,  the  two  factor  solution  for  the  SESBI-R  appears  to  be 
replicable  and  valid. 

This  study  provides  strong  psychometric  support  for  both  the  SESBI  and 
SESBI-R.  There  were  few  important  differences  between  the  two  measures.  The 
SESBI-R  appears  to  have  better  item  distribution  and  variability,  while  the  SESBI 
Intensity  Scale  score  was  more  stable  than  the  SESBI-R  Intensity  Scale  score  with  ESE 
children  due  to  the  less  stable  ADHD  factor.  The  SESBI-R  appears  to  provide  a  more 
stable  factor  structure  than  the  SESBI  has  demonstrated  in  past  studies.  In  addition, 
previous  study  has  suggested  that  the  SESBI-R' s  item  content  may  be  slightly  less  age 
specific,  as  intended  by  the  authors  of  the  SESBI.  This  is  a  particularly  important 
issue  for  much  needed  longitudinal  research  in  the  area  of  treatment  of  behavior 
problems  in  children  (McMahon,  1994).  Thus,  in  light  of  these  and  other  findings,  it 
is  recommended  that  the  SESBI-R  be  used  in  the  assessment  of  disruptive  behavior 
problems  in  the  classroom. 

The  inclusion  of  a  large  sample  of  children  with  known  behavior  problems  at 
school  is  a  strength  of  the  study.  In  addition,  this  research  explores  the  important 
issue  of  child  race,  which  has  received  little  attention  in  the  area  of  child  assessment. 
Even  more  neglected  is  the  issue  of  teacher  variability  in  terms  of  teacher  individual 
differences  and  the  impact  on  ratings  and  more  systematic  differences  due  to 
demographic  variables  such  as  race.  Subsequent  research  should  further  examine  the 
variables  of  child  race,  teacher  race,  and  their  interactions  with  a  larger  group  of 
African  American  teachers.  Also,  although  this  study  is  among  the  first  to  consider 


87 

the  effects  of  race  on  the  assessment  of  behavior  problems  in  children,  the  exploration 
of  only  Caucasian  and  African  American  teachers  and  children  remains  unrefined.  It 
will  be  important  to  examine  a  broader  array  of  ethincity  to  determine  what  effects 
ethnicity  and  culture  play  on  the  display  and  perception  of  behavior  problems  in 
children. 

This  study  also  established  empirical,  clinical  cut  off  points  for  the  SESBI 
scales.  It  will  be  important  to  replicate  the  ability  of  these  cut  off  points  to  correctly 
identify  children  with  problems  in  a  new  sample.  It  will  also  be  important  to 
determine  if  the  same  cut  off  scores  are  useful  for  children  in  grades  higher  than  5th 
grade. 


APPENDIX  A 
LETTTER  TO  TEACHERS  AND  SESBI  FORMS 

Dear  Teacher, 

We  are  studying  a  measure  that  looks  at  behavior  problems  in  children.  To  do  this  we 
are  asking  teachers  to  answer  some  questions.  The  answers  you  give,  along  with 
those  of  other  teachers  will  help  us  to  learn  about  the  kinds  of  behaviors  that  are 
problems  for  teachers.  Your  answers  will  also  help  us  treat  children  with  behavior 
problems. 

Please  complete  the  following  questionnaires  on  8  children  in  your  classroom.  The 
children  should  be  2  African  American  boys,  2  Caucasian  boys,  2  African  American 
girls,  and  2  Caucasian  girls.  It  should  take  about  an  hour  to  complete  the  set  of 
questionnaires.  Please  don't  put  your  students'  name  on  the  questionnaires,  only  their 
number  in  your  grade  book.  When  you  have  completed  the  questionnaires,  place 
them  in  the  envelope  and  a  researcher  will  return  to  pick  them  up.  In  addition, 
someone  may  be  contacting  you  to  observe  the  children  you  rate. 

You  will  be  paid  a  $30  honorarium  in  appreciation  for  your  participation  in  the  study. 

If  you  have  any  questions  about  the  study,  please  call  Arista  Rayfield  at  (904)  395- 
0111  ext.  7-7918.  Thank  you  very  much  for  your  help  with  our  research. 

Sincerely, 


Arista  Rayfield,  M.S. 

Doctoral  Student 

Clinical  and  Health  Psychology 


Sheila  Eyberg,  Ph.D. 
Professor 

Clinical  and  Health  Psychology 


88 


Child's  Ethnicity  

Child' s  Gr»de  

Dale  of  Rating  

Child's  Sex   Child's  Grade  bock  nuniber_  

To  your  knowledge,  dtKS  this  student  have  a  history  of:  (circle  one) 

a.  Menial  Retardation 

b.  Learning  Disability  (pleue  ^)ecify  type) 

c.  Slow  Leuoer 

d.  Other  Learning  Problems  (please  specify) 

SUTTER-EYBERG  STUDENT  BEHAVIOR  INVENTORY  -  FORM  E 

Diiectiooa:  Below  are  a  series  of  phrases  thai  describe  children's  behavior.  Please  (1)  circle  the  number  describing  how  often  the  behavior  cuncatl; 
occurs  with  this  student,  and  (2)  circle  either  "yes"  or  "no"  to  indicate  whether  the  behavior  is  cunently  a  problem. 


How  often  does  this  occur  with  this  student? 

Is  this  a  problem  for  you? 

Never 

Seldom 

Sometimes  Often 

Always 

1 .  Has  temper  tantrums  I 

2 

3               4  5 

6 

7 

YES 

fl 

2.  Pouts  1 

2 

3              4  5 

6 

7 

YES 

3 .  Teases  or  prtjvokes  other  students  1 

2 

3               4  5 

6 

7 

YES 

g 

4   Lies  1 

2 

3              4  5 

6 

7 

YES 

fa 
IS 

5.  Hits  leacbei^s)  J 

2 

3              4  5 

6 

7 

YES 

61 

6.  Acts  frustrated  with  difficult  taAs  1 

2 

3                4  5 

6 

7 

YES 

H 

7.  Does  not  obey  school  rules  on  hiafher  own  1 

2 

3                4  5 

6 

7 

YES 

H 

8.  Demands  teacher  attention  i 

2 

3                4  5 

6 

7 

YES 

H 

9.  Dawdles  in  obeying  rules  or  instructions  1 

2 

3                4  5 

6 

7 

YES 

fl 

10.  Acts  bossy  with  other  students  1 

2 

3                4  5 

6 

7 

YES 

H 

11.  Gets  angry  when  doesn't  jB  his/her  own  way  1 

2 

3                4  5 

6 

7 

YES 

fl 

12.  Interrupts  teachers  1 

2 

3               4  3 

6 

7 

YES 

fl 

13.  Has  difficulty  sharing  materials  1 

2 

3              4  5 

6 

7 

YES 

fl 

14.  hnpulsive.  acts  before  thinking  1 

2 

3              4  3 

6 

7 

YES 

H 

15.  Refuses  to  obey  until  dueatened  with  punishment  1 

2 

3                4  5 

6 

7 

YES 

fl 

16.  Has  difficulty  staying  on  task  1 

2 

3              4  3 

6 

7 

YES 

fl 

17.  Blames  odiers  for  problem  behaviors  I 

2 

3                4  5 

6 

7 

YES 

H 

18.  Yells  or  screams  1 

2 

3                4  5 

6 

7 

YES 

fl 

19.  Has  difficulty  entering  groups  i 

2 

3               4  5 

6 

7 

YES 

H 

20.  Is  uncoopeiaiive  in  group  activities  I 

2 

3               4  5 

6 

7 

YES 

fl 

21.  Is  easily  distracted  1 

2 

3              4  5 

6 

7 

YES 

S 

22.  Has  difficulty  accepting  criticism  or  correction  1 

2 

3                4  5 

6 

7 

YES 

H 

23.  Fails  to  fini*  tasks  or  projects  i 

2 

3               4  5 

6 

7 

YES 

H 

24.  Cries  I 

2 

3              4  5 

6 

7 

YES 

S 

90 


25.  Sasaes  teacber<s) 

26.  Vnbally  fights  with  other  sRidmu 

27.  Destroys  bocks  and  other  objects 

28.  Whines 

29.  Is  overactive  or  restless 

30.  Physically  fights  with  ether  students 

3 1 .  Makes  notses  in  class 

32.  Is  careless  with  books  and  other  objects 

33.  Acts  defiant  when  told  to  do  something 

34.  Argues  with  teachers  about  rules  or  instructions 

35.  Interrtipts  other  students 

36.  Steals 

37.  Destroys  oitaen'  books  or  objects 

38.  b  noisy 

39.  Has  trouble  awaiting  turn 

40.  Blurts  out  answers  befoie  question  is  complete 

4 1 .  Thieatens  teachers  with  object  or  weapon 

42.  Talks  excessively 

43.  Loses  things  needed  for  school  activities 

44.  Fidgets  or  squirms  in  seat 

45.  Fails  to  listen  to  instructions 

46.  Is  touchy  or  easily  annoyed 

47.  Bothers  others  on  purpose 

48.  Refuses  coming  or  suying  at  school 

49.  Is  spiteful  or  vindictiw 

50.  Has  trouble  paying  attention 

51.  Swears  or  uses  obscene  language 

52.  Has  difficulty  staying  seated 

53.  Hireaiens  peers  with  object  or  weapon 


How  often  tloes  this  occui 
Seltlom  Sometime: 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 

2  3 


with  this  student? 
Often 


Is  this  a  problem  for  you? 

Always 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

*  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  H 

6  7  YES  B 

6  7  YES  H 

6  7  YES  H 


REFERENCES 


Abikoff,  H.,  &  Gittelman,  R.  (1985a).  Classroom  observation  code:  A 
modification  of  the  Stony  Brook  Code.  Psychopharmacologv  Bulletin.  21.  901-909. 

Abikoff,  H.,  &  Gittelman,  R.  (1985b).  The  normalizing  effects  of 
methylphenidate  on  the  classroom  behavior  of  ADHD  children.  Journal  of  Abnormal 
Child  Psychology.  13.  33-44. 

Abikoff,  H.,  Gittelman,  R.,  &  Klein,  D.F.  (1980).  Classroom  observation 
code  for  hyperactive  children:  A  replication  of  validity.  Journal  of  Consulting  and 
Clinical  Psychology.  48.  555-565. 

Abikoff,  H.,  Gittelman-Klein,  R.,  &  Klein,  D.F.  (1977).  Validation  of  a 
classroom  observation  code  for  hyperactive  children.  Journal  of  Consulting  and 
Clinical  Psvchologv.45.  772-783. 

Abikoff,  H.,  &  Klein,  R.  G.  (1992).  Attention-deficit  hyperactivity  and 
conduct  disorder:  Comorbidity  and  implications  for  treatment.  Journal  of  Consulting 
and  Clinical  Psychology.  60  881-892. 

Achenbach,  T.M.,  McConaughy,  S.H.,  &  Howell,  C.T.  (1987). 
Child/adolescent  behavioral  and  emotional  problems:  Implications  of  cross-informant 
correlations  for  situational  specificity.  Psychological  Bulletin.  101.  213-232. 

American  Psychiatric  Association.  (1987).  Diagnostic  and  statistical  manual  of 
mental  disorders  (3rd  ed.,  rev.).  Washington,  DC:  Author. 

American  Psychiatric  Association.  (1994).  Diagnostic  and  statistical  manual  of 
mental  disorders  (4th  ed.).  Washington,  DC:  Author. 

Anastasi,  A.  (1988).  Psychological  testing  New  York:  Macmillan  Publishing 
Company. 

Barkley,  R.  A.  (1988).  Child  behavior  rating  scales  and  checklists.  In  M. 
Rutter,  A.  H.  Tuma,  I.  S.  Lann  (Eds.),  Assessment  and  diagnosis  in  child 
psvchopathology  (pp.      113-155).  New  York:  Guilford  Press. 


91 


92 


Barkley,  R.  A.  (1990).  Behavior  rating  scales.  In  R.  A.  Barkley  (Ed.), 
Attention  deficit  hyperactivity  disorder:  A  handbook  for  diagnosis  and  treatment  (pp. 
278-326).       New  York:  Guilford  Press. 

Behar,  L.,  &  Stringfield,  S.  (1974).  A  behavior  rating  scale  for  the  preschool 
child.  Developmental  Psychology.  10.  601-610. 

Breiner,  J.,  &  Forehand,  R.  (1981).  An  assessment  of  the  effects  of  parent 
training  on  clinic  referred  children's  school  behavior.  Behavioral  Assessment,  3,  31- 
42. 

Burns,  G.  L.,  &  Owen,  S.  M.  (1990).  Disruptive  behaviors  in  the  classroom: 
Initial  standardization  of  a  new  teacher  rating  scale.  Journal  of  Abnormal  Child 
Psychology.  18.  515-525. 

Campbell,  D.T.,  &  Fiske,  D.W.  (1959).  Convergent  and  discriminant 
validation  by  the  multitrait-multimethod  matrix.  Psychological  Bulletin.  56.  81-105. 

Connell,  H.M.,  Irvine,  L.,  &  Rodney,  J.  (1982).  Prevalence  of  psychiatric 
disorder  in  rural  school  children.  Australian  and  New  Zealand  Journal  of  Psychiatry. 
m  43-46. 

Conners,  C.  K.  (1969).  A  teacher  rating  scale  for  use  in  drug  studies  with 
children.  American  Journal  of  Psychiatry.  126.  884-888. 

Conners,  C.  K.  (1976).  Rating  scales  for  use  in  drug  studies  with  children. 
PsvchoDharmacology  Bulletin:  Special  Issue.  Psycho-pharmacology  with  Children.  59. 
24-84. 

Dumas,  J.  (1992).  [SESBI  scores  for  144  non-referred  preschoolers  by  age  and 
sex].  Unpublished  raw  data. 

Edlebrock,  C.  &  Achenbach,  T.  A.  (1984).  The  teacher  version  of  the  Child 
Behavior  Profile:  I.  Boys  aged  6-11.  Journal  of  Consulting  and  Clinical  Psychology 
52,  207-217. 

Eyberg,  S.  (1989,  August).  Sutter-Evberg  student  behavior  inventory:  A 
teacher  rating  scale  of  conduct  problem  behaviors.  Paper  presented  at  the  meeting  of 
the  American  Psychological  Association,  New  Orleans,  LA. 

Eyberg,  S.M.  (in  press).  Evberg  Child  Behavior  Inventory  and  Sutter-Eyberg 
Student  Behavior  Inventory:  Professional  Manual.  Odessa,  EL:  Psychological 
Assessment  Resources. 


93 

Fleiss,  J.L.  (1981).  Statistical  methods  for  rates  and  proportions.  New  York: 

Wiley. 

Freniere,  P.  J.,  Dumas,  J.  E.,  Capuanto,  P.,  &  Dubeau,  D.  (1992). 
Development  and  validation  of  the  preschool  socio-affective  profile.  Psychological 
Assessment.  4,  442-450. 

Funderburk,  B.  W.,  &  Eyberg,  S.  M.  (1989).  Psychometric  characteristics  of 
the  Sutter-Eyberg  student  behavior  inventory:  A  school  behavior  rating  scale  for  use 
with  preschool  children.  Behavioral  Assessment.  H,  297-313. 

Funderburk,  B.  W.,  Eyberg,  S.  M.,  &  Behar,  L.  (1989,  August). 
Psychometric  properties  of  the  SESBI  with  high-SES  preschoolers.  Paper  presented  at 
the  meeting  of  the  American  Psychological  Association,  New  Orleans,  LA. 

Hollon,  S.D.,  &  Flick,  S.N.  (1988).  On  the  meaning  and  methods  of  clinical 
significance.  Behavioral  Assessment.  10.  197-206. 

Hugdahl,  K.,  &  Ost,  L.  (1981).  On  the  difference  between  statistical  and 
clinical  significance.  Behavioral  Assessment.  3.  289-295. 

Jacob,  R.G.,  O'Leary,  K.D.,  &  Rosenblad,  C.  (1978).  Formal  and  Informal 
Classroom  Settings:  Effects  on  hyperactivity.  Journal  of  Abnormal  Child 
Psvchology.  6.  47-59. 

Jacobson,  N.S.,  Follette,  W.C.,  Revenstorf.  D.  (1984).  Psychotherapy 
outcome  research:  Methods  for  reporting  variability  and  evaluation  clinical 
significance.  Behavior  Therapy.  15.  336-352. 

Jacobson,  N.S.,  Follette,  W.C.,  Revenstorf.  D.  (1986).  Towards  a  standard 
definition  of  clinically  significant  change.  Behavior  Therapy,  17  308-311. 

Jacobson,  N.  S.,  &  Revenstorf,  D.  (1988).  Statistics  for  assessing  the  clinical 
significance  of  psychotherapy  techniques:  Issues,  problems,  and  new  developments. 
Behavioral  Assessment  10  133-145. 

Jacobson,  N.S.,  &  Truax,  P.  (1992).  Clinical  significance:  A  statistical 
approach  to  defining  meaningful  change  in  psychotherapy  research.  In  A.  E.  Kazdin, 
(Ed.),  Methodological  issues  in  clinical  researrh  (pp.  631-648)  Washington,  DC: 
American  Psychological  Association. 

Kazdin,  A.E.  (1977).  Assessing  the  clinical  or  applied  importance  of  behavior 
change  through  social  validation.  Behavior  Modification.  1  427-451. 


94 


Kazdin,  A.E.,  Esveldt-Dawson,  K.,  Loar,  L.  (1983).  Correspondence  of 
teacher  ratings  and  direct  observations  of  classroom  behavior  of  psychiatric  inpatient 
children.  Journal  of  Abnormal  Child  Psychology,  11,  549-564. 

Kazdin,  A.  E.,  Mazurick,  J.  L.,  &  Bass,  D.  (1993).  Risk  for  attrition  in 
treatment  of  antisocial  children  and  families.  Journal  of  Clinical  and  Consulting 
Psychology.  22,  2-16. 

Kazdin,  A.  E.,  Seigel,  T.  C,  &  Bass,  D.  (1990).  Drawing  upon  clinical 
practice  to  inform  research  on  child  and  adolescent  psychotherapy:  A  survey  of 
practitioners.  Professional  Psvchologv:  Research  and  Practice.  21.  189-198. 

Ladish,  C,  Sosna,  T.  D.,  Warner,  D.,  &  Burns,  G.  L.  (August,  1989). 
Psychometric  properties  of  the  Sutter-Eyberg  student  behavior  inventory  in  a  preschool 
sample.  Paper  presented  at  the  meeting  of  the  American  Psychological  Association, 
New  Orleans,  LA. 

Loeber,  R.  (1982).  The  stability  of  anti-social  and  delinquent  behavior:  A 
review.  Child  Development.  53,  1431-1446. 

Loeber,  R.,  Green,  S.,  &  Lahey,  B.  (1990).  Mental  health  professionals' 
perception  of  the  utility  of  children,  mothers,  and  teachers  as  informants  of  childhood 
psychopathology.  Journal  of  Clinical  Child  Psvchologv.  19,  136-143. 

Lucas,  CP.  (1992).  The  order  effect:  Reflections  on  the  validity  of  multiple 
test  presentations.  Psychological  Medicine.  22.  19-202. 

Macmann,  G.  M.,  Bamett,  D.  W.,  Burd,  S.  A.,  Jones,  T.,  LeBuffe,  P.  A., 
O'Mally,  D.,  Shade,  D.  B.,  &  Wright,  A. 

(1992).  Construct  validity  of  the  Child  Behavior  Checklist:  Effects  of  item  overlap  on 
second-order  factor  structure.  Psychological  Assessment.  4  (1),  113-116. 

Mash,  E.  J.  (1989).  Treatment  of  child  and  family  disturbance:  A  behavioral- 
systems  perspective.  In  E.  J.         Mash  &  R.  A.  Barkley  (Eds.),  Treatment  of 
childhood  disorders  (pp.  3-36).  New  York:  Guilford  Press. 

McMahon,  R.  J.  (1994).  Diagnosis,  assessment,  and  treatment  of  externalizing 
problems  in  children:  The  role  of  longitudinal  data.  Journal  of  Consulting  and  Clinical 
Psvchologv.  62,  901-917. 

McMahon,  R.  J.,  «&  Forehand,  R.  (1988).  Conduct  disorders.  In  E.  J.  Mash  & 
L.  G.  Terdal  (Eds.),  Behavioral  assessment  of  childhood  disorders  (pp.  105-153). 
New  York:  Guilford  Press. 


95 


McMahon,  R.J.,  &  Wells,  K.C.  (1989).  Conduct  disorders.  In  E.  J.  Mash  & 
R.  A.  Barkley  (Eds.),  Treatment  of  childhood  disorders  (pp.  73-132).  New  York: 
Guilford  Press. 

McNeil,  C.  B.,  Eyberg,  S.,  Eisenstadt,  T.  H.,  Newcomb,  K.,  &  Funderburk, 
B.  (1991).  Parent-Child  Interaction  Therapy  with  behavior  problem  children: 
Generalization  of  treatment  effects  to  the  school  setting.  Journal  of  Clinical  Child 
Psychology.  20(2)  140-151. 

Newcomb  K.  P.,  Eyberg,  S.  M.,  Bodiford,  C.  A.,  Eisenstadt,  T.  H.,  & 
Funderburk,  B.  W.  (1989).  SESBI  and  classroom  behavioral  observations.  Paper 
presented  at  the  meeting  of  the  American  Psychological  Association,  New  Orleans, 
LA. 

Offord,  D.R.,  Boyle,  M.H.,  &  Racine,  Y.  (1989).  Ontario  child  health  study: 
Correlates  of  disorder.  American  Academy  of  Child  and  Adolescent  Psychiatry.  856- 
860. 

Patterson,  G.  R.  (1986).  Performance  models  for  antisocial  boys.  American 
Psychologist.  41(4),  432-444. 

Platzman,  K.,  Stoy,  M.,  Brown,  R.,  Coles,  C,  Smith,  I.,  &  Falek,  A. 
(1992).  Review  of  observational  methods  in  attention  deficit  hyperactivity  disorder 
(ADHD):  Implications  for  diagnosis.  School  Psychologv  Quarterly.  7(3 V  155-177. 

Quay,  H.  C.  (1986).  Conduct  disorders.  In  H.  C.  Quay  &  J.  S.  Werry 
(Eds.),  Psvchopathological  Disorders  of  Childhood  (pp.  35-72).  New  York:  Wiley. 

Rayfield,  A.  (1993).  Standardization  of  the  Sutter-Evberg  Student  Behavior 
Inventory.  Unpublished  master's  thesis.  University  of  Florida,  Gainesville. 

Rayfield,  A.,  &  Eyberg,  S.  (1996).  Standardization  of  the  SESBI  with  niral 
middle  school  and  high  school  students,  (manuscript  submitted  for  publication). 

Reid,  J.B.  (1978).  A  social  learning  approach  to  familv  intervention:  IT 
Observation  in  home  settings  Eugene,  OR:  Castalia. 

Reid,  J.B.,  Patterson,  G.R.,  Baldwin,  D.V.,  &  Dishion,  T.J.  (1988). 
Observations  in  the  assessment  of  childhood  disorders.  In  M.  Rutter,  A.  H.  Tuma,  I. 
S.  Lann  (Eds.),  Assessment  and  diagnosis  in  child  psvchopathology  (pp.  156-195). 
New  York:  Guilford  Press. 

Rigelhaupt,  J.D.,  Boggs,  S.R.,  Eyberg,  S.M.,  &  Edward.  D.  (1994, 
November).  Reliabilitv  and  validity  of  the  Revised  Edition  of  the  School  Ohserv^t'"" 


Coding  System  (REdSOCS).  Poster  presented  at  the  meeting  of  the  Association  for 
the  Advancement  of  Behavior  Therapy,  San  Diego. 

Robins,  L.  N.  (1966).  The  adult  development  of  the  anti-social  child. 
Seminars  in  Psychiatry,  2,420-434. 

Robinson,  A.,  Eyberg,  S.  M.,  &  Ross,  A.  W.  (1980).  The  standardization  of 
an  inventory  of  child  conduct  problem  behaviors.  Journal  of  Clinical  Child 
Psychology.  9,  22-28. 

Rutter,  M.,  Cox,  A.,  Tupling,  C,  Berger,  M.,  &  Yule,  W.  (1975). 
Attainment  and  adjustment  in  two  geographical  areas.  I.  The  prevalence  of  psychiatric 
disorder.  American  Journal  of  Psychiatry.  126.  493-509. 

Sattler,  J.  (1986).  Assessment  of  children  (Rev.  ed.).  Jerome  M.  Sattler: 
San  Deigo. 

Schaughency,  E.  A.,  Hurley,  L.  K.,  Yano,  K.  E.,  Seeley,  J.,  Talarico,  B. 
(1989).  Psvchometric  properties  of  the  SESBI  with  clinic-referred  children.  Paper 
presented  at  the  meeting  of  the  American  Psychological  Association,  New  Orleans 
LA. 

Schuhmann,  E.M.,  Duming,  P.E.,  Eyberg,  S.M.,  &  Boggs,  S.R.  (in  press). 
Screening  for  conduct  problem  behavior  in  pediatric  settings  using  the  Eyberg  Child 
Behavior  Inventory.  Ambulatory  Child  Health. 

Schuhmann,  E.M.,  Eyberg  S.M.,  Boggs  S.R.,  &  Rayfield  A.  (1996). 
Oppositional  Defiant  Disorder  with  and  without  concomitant  Attention  nefirit 
Hyperactivity  Disorder:  Clincal  characteristics  and  response  to  treatement 
(manuscript  submitted  for  publication). 

Toro,  P.A.,  Weissberg,  R.P.,  Grare,  J.  &  Liebenstein,  N.L.  (1990).  A 
comparison  of  children  with  and  without  learning  disabilities  on  social  problem 
solving  skill,  school  behavior,  and  family  background.  Journal  of  T^^ming 
Disabilities.  23  115-120. 

Trites,  R.  L.,  Blouin,  A.  G.,  Ferguson,  H.  B.,  &  Lynch,  G.  W.  (1981).  The 
Conners  Teacher  Rating  Scale:  An  epidemiological  inter-rater  reliability  and  follow-up 
mvestigation.  In  K.  Gadow,  &  J.  Loney  (Eds.),  Psychosocial  aspects  of  drug 
treatment  for  hyperactivity,  Boulder,  CO:  Westview  Press. 

Walker,  H.  M.,  Shinn,  M.  R.,  O'Neill,  R.E.,  &  Ramsey,  E.  (1986).  A 
longitudinal  assessment  of  the  developmpnt  of  antisocial  behavior  in  hoys:  Rationale 
methodology,  and  first  year  results.  Unpublished  manuscript. 


97 


Wampold,  B.E.,  &  Jensen,  W.R.  (1986).  Clinical  significance  revisited. 
Behavior  Therapy,  17,  302-311. 

Weinrott,  M.R.,  &  Jones,  R.R.  (1984).  Overt  versus  covert  assessment  of 
observer  reliability.  Child  Development.  55.  1125-1137. 

Wells,  K.  C,  &  Forehand,  R.  (1985).  Conduct  and 
oppositional  disorders.  In  P.  H.  Bornstein  &  A.  Kazdin  (Eds.),  Handbook  of  clinical 
behavior  therapy  with  children,  (pp.  218-265)  Homewood,  IL:  Dorsey  Press. 


Wolff,  S.  (1971).  Dimensions  and  clusters  of  symptoms  in  disturbed  children. 
British  Journal  of  Psychiatry.  118.  421-427. 


BIOGRAPHICAL  SKETCH 
Arista  Dianne  Rayfield  was  born  February  19,  1969,  in  Sylacauga,  Alabama, 
to  Kenneth  and  Cecile  Beasley.  Arista  majored  in  psychology  at  the  University  of 
Alabama  at  Birmingham  and  graduated  cum  laude  with  a  Bachelor  of  Science  degree 
in  June  1991.  Arista  entered  the  Clinical  and  Health  Psychology  doctoral  program  at 
the  University  of  Florida  in  August  1991.  She  completed  her  predoctoral  internship  at 
the  Medical  College  of  Georgia  and  Veterans'  Administration  Consortium  in  Augusta, 
Georgia  in  July  1996.  After  graduation,  Arista  plans  to  remain  at  the  University  of 
Florida  as  a  postdoctoral  associate  in  the  Department  of  Psychiatry. 


98 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Sheila  M.  Eyberg,  Chair 
Professor  of  Clinical  and  Health 
Psychology 

I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Education 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Stephen  R.  Boggs 
Associate  Professor  of 
Clinical  and  Health  Psychology 

I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Associate  Professor  of 
Clinical  and  Health  Psychology 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


\  E(.  Johnson 


Suzanne 
Professor  of  Clinical  and  Health 
Psychology 

This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  College  of  Health 
Professions  and  to  the  Graduate  School  and  was  accepted  as  partial  fulfillment  of  the 
requirements  for  the  degree  of  Doctor  of  Philosophy. 


May  1997 


Dean,  College  of  Health  Professions 


Dean,  Graduate  School 


